Sadri Hassani 


Mathematical 
Physics 


A Modern Introduction to 
Its Foundations 


Mathematical Physics 


Sadri Hassani 


Mathematical 
Physics 


A Modern Introduction to 
Its Foundations 


Second Edition 


g) Springer 


Sadri Hassani 
Department of Physics 
Illinois State University 
Normal, Illinois, USA 


ISBN 978-3-319-01194-3 ISBN 978-3-319-01195-0 (eBook) 
DOI 10.1007/978-3-319-01195-0 
Springer Cham Heidelberg New York Dordrecht London 


Library of Congress Control Number: 2013945405 


© Springer International Publishing Switzerland 1999, 2013 

This work is subject to copyright. All rights are reserved by the Publisher, whether the whole 
or part of the material is concerned, specifically the rights of translation, reprinting, reuse of 
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, 
and transmission or information storage and retrieval, electronic adaptation, computer software, 
or by similar or dissimilar methodology now known or hereafter developed. Exempted from this 
legal reservation are brief excerpts in connection with reviews or scholarly analysis or material 
supplied specifically for the purpose of being entered and executed on a computer system, for 
exclusive use by the purchaser of the work. Duplication of this publication or parts thereof is 
permitted only under the provisions of the Copyright Law of the Publisher’s location, in its 
current version, and permission for use must always be obtained from Springer. Permissions 
for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are 
liable to prosecution under the respective Copyright Law. 

The use of general descriptive names, registered names, trademarks, service marks, etc. in this 
publication does not imply, even in the absence of a specific statement, that such names are 
exempt from the relevant protective laws and regulations and therefore free for general use. 
While the advice and information in this book are believed to be true and accurate at the date of 
publication, neither the authors nor the editors nor the publisher can accept any legal responsi- 
bility for any errors or omissions that may be made. The publisher makes no warranty, express 
or implied, with respect to the material contained herein. 


Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


To my wife, Sarah, 
and to my children, 
Dane Arash and Daisy Bita 


Preface to Second Edition 


Based on my own experience of teaching from the first edition, and more im- 
portantly based on the comments of the adopters and readers, I have made 
some significant changes to the new edition of the book: Part I is substan- 
tially rewritten, Part VIII has been changed to incorporate Clifford algebras, 
Part IX now includes the representation of Clifford algebras, and the new 
Part X discusses the important topic of fiber bundles. 

I felt that a short section on algebra did not do justice to such an im- 
portant topic. Therefore, I expanded it into a comprehensive chapter dealing 
with the basic properties of algebras and their classification. This required a 
rewriting of the chapter on operator algebras, including the introduction of a 
section on the representation of algebras in general. The chapter on spectral 
decomposition underwent a complete overhaul, as a result of which the topic 
is now more cohesive and the proofs more rigorous and illuminating. This 
entailed separate treatments of the spectral decomposition theorem for real 
and complex vector spaces. 

The inner product of relativity is non-Euclidean. Therefore, in the discus- 
sion of tensors, I have explicitly expanded on the indefinite inner products 
and introduced a brief discussion of the subspaces of a non-Euclidean (the 
so-called semi-Riemannian or pseudo-Riemannian) vector space. This inner 
product, combined with the notion of algebra, leads naturally to Clifford al- 
gebras, the topic of the second chapter of Part VIII. Motivating the subject 
by introducing the Dirac equation, the chapter discusses the general prop- 
erties of Clifford algebras in some detail and completely classifies the Clif- 
ford algebras Cy, (R), the generalization of the algebra C3 (R), the Clifford 
algebra of the Minkowski space. The representation of Clifford algebras, 
including a treatment of spinors, is taken up in Part IX, after a discussion of 
the representation of Lie Groups and Lie algebras. 

Fiber bundles have become a significant part of the lore of fundamen- 
tal theoretical physics. The natural setting of gauge theories, essential in 
describing electroweak and strong interactions, is fiber bundles. Moreover, 
differential geometry, indispensable in the treatment of gravity, is most ele- 
gantly treated in terms of fiber bundles. Chapter 34 introduces fiber bundles 
and their complementary notion of connection, and the curvature form aris- 
ing from the latter. Chapter 35 on gauge theories makes contact with physics 
and shows how connection is related to potentials and curvature to fields. It 
also constructs the most general gauge-invariant Lagrangian, including its 
local expression (the expression involving coordinate charts introduced on 
the underlying manifold), which is the form used by physicists. In Chap. 36, 
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by introducing vector bundles and linear connections, the stage becomes 
ready for the introduction of curvature tensor and torsion, two major play- 
ers in differential geometry. This approach to differential geometry via fiber 
bundles is, in my opinion, the most elegant and intuitive approach, which 
avoids the ad hoc introduction of covariant derivative. Continuing with dif- 
ferential geometry, Chap. 37 incorporates the notion of inner product and 
metric into it, coming up with the metric connection, so essential in the gen- 
eral theory of relativity. 

All these changes and additions required certain omissions. I was careful 
not to break the continuity and rigor of the book when omitting topics. Since 
none of the discussions of numerical analysis was used anywhere else in the 
book, these were the first casualties. A few mathematical treatments that 
were too dry, technical, and not inspiring were also removed from the new 
edition. However, I provided references in which the reader can find these 
missing details. The only casualty of this kind of omission was the discus- 
sion leading to the spectral decomposition theorem for compact operators in 
Chap. 17. 

Aside from the above changes, I have also altered the style of the book 
considerably. Now all mathematical statements—theorems, propositions, 
corollaries, definitions, remarks, etc.—and examples are numbered consec- 
utively without regard to their types. This makes finding those statements 
or examples considerably easier. I have also placed important mathemat- 
ical statements in boxes which are more visible as they have dark back- 
grounds. Additionally, I have increased the number of marginal notes, and 
added many more entries to the index. 

Many readers and adopters provided invaluable feedback, both in spot- 
ting typos and in clarifying vague and even erroneous statements of the 
book. I would like to acknowledge the contribution of the following peo- 
ple to the correction of errors and the clarification of concepts: Sylvio An- 
drade, Salar Baher, Rafael Benguria, Jim Bogan, Jorun Bomert, John Chaf- 
fer, Demetris Charalambous, Robert Gooding, Paul Haines, Carl Helrich, 
Ray Jensen, Jin-Wook Jung, David Kastor, Fred Keil, Mike Lieber, Art Lind, 
Gary Miller, John Morgan, Thomas Schaefer, Hossein Shojaie, Shreenivas 
Somayaji, Werner Timmermann, Johan Wild, Bradley Wogsland, and Fang 
Wu. As much as IJ tried to keep a record of individuals who gave me feed- 
back on the first edition, fourteen years is a long time, and I may have omit- 
ted some names from the list above. To those people, I sincerely apologize. 
Needless to say, any remaining errors in this new edition is solely my re- 
sponsibility, and as always, I'll greatly appreciate it if the readers continue 
pointing them out to me. 

I consulted the following three excellent books to a great extent for the 
addition and/or changes in the second edition: 


Greub, W., Linear Algebra, 4th ed., Springer-Verlag, Berlin, 1975. 
Greub, W., Multilinear Algebra, 2nd ed., Springer-Verlag, Berlin, 1978. 
Kobayashi, S., and K. Nomizu, Foundations of Differential Geometry, 
vol. 1, Wiley, New York, 1963. 


Maury Solomon, my editor at Springer, was immeasurably patient and 
cooperative on a project that has been long overdue. Aldo Rampioni has 
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been extremely helpful and cooperative as he took over the editorship of 
the project. My sincere thanks go to both of them. Finally, I would like to 
thank my wife Sarah for her unwavering forbearance and encouragement 
throughout the long-drawn-out writing of the new edition. 


Normal, IL, USA Sadri Hassani 
November, 2012 
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“Ich kann es nun einmal nicht lassen, in diesem Drama von Mathematik und 
Physik—die sich im Dunkeln befruchten, aber von Angesicht zu Angesicht so 
gerne einander verkennen und verleugnen—die Rolle des (wie ich geniigsam er- 
fuhr, oft unerwiinschten) Boten zu spielen.” 

Hermann Weyl 


It is said that mathematics is the language of Nature. If so, then physics 
is its poetry. Nature started to whisper into our ears when Egyptians and 
Babylonians were compelled to invent and use mathematics in their day- 
to-day activities. The faint geometric and arithmetical pidgin of over four 
thousand years ago, suitable for rudimentary conversations with nature as 
applied to simple landscaping, has turned into a sophisticated language in 
which the heart of matter is articulated. 

The interplay between mathematics and physics needs no emphasis. 
What may need to be emphasized is that mathematics is not merely a tool 
with which the presentation of physics is facilitated, but the only medium 
in which physics can survive. Just as language is the means by which hu- 
mans can express their thoughts and without which they lose their unique 
identity, mathematics is the only language through which physics can ex- 
press itself and without which it loses its identity. And just as language is 
perfected due to its constant usage, mathematics develops in the most dra- 
matic way because of its usage in physics. The quotation by Wey] above, 
an approximation to whose translation is “Jn this drama of mathematics and 
physics—which fertilize each other in the dark, but which prefer to deny and 
misconstrue each other face to face—I cannot, however, resist playing the 
role of a messenger, albeit, as I have abundantly learned, often an unwel- 
come one,’ is a perfect description of the natural intimacy between what 
mathematicians and physicists do, and the unnatural estrangement between 
the two camps. Some of the most beautiful mathematics has been motivated 
by physics (differential equations by Newtonian mechanics, differential ge- 
ometry by general relativity, and operator theory by quantum mechanics), 
and some of the most fundamental physics has been expressed in the most 
beautiful poetry of mathematics (mechanics in symplectic geometry, and 
fundamental forces in Lie group theory). 

I do not want to give the impression that mathematics and physics cannot 
develop independently. On the contrary, it is precisely the independence of 
each discipline that reinforces not only itself, but the other discipline as 
well—just as the study of the grammar of a language improves its usage and 
vice versa. However, the most effective means by which the two camps can 


xi 


xii 


Preface to First Edition 


accomplish great success is through an intense dialogue. Fortunately, with 
the advent of gauge and string theories of particle physics, such a dialogue 
has been reestablished between physics and mathematics after a relatively 
long lull. 


Level and Philosophy of Presentation 


This is a book for physics students interested in the mathematics they use. 
It is also a book for mathematics students who wish to see some of the ab- 
stract ideas with which they are familiar come alive in an applied setting. 
The level of presentation is that of an advanced undergraduate or beginning 
graduate course (or sequence of courses) traditionally called “Mathematical 
Methods of Physics” or some variation thereof. Unlike most existing math- 
ematical physics books intended for the same audience, which are usually 
lexicographic collections of facts about the diagonalization of matrices, ten- 
sor analysis, Legendre polynomials, contour integration, etc., with little em- 
phasis on formal and systematic development of topics, this book attempts 
to strike a balance between formalism and application, between the abstract 
and the concrete. 

I have tried to include as much of the essential formalism as is neces- 
sary to render the book optimally coherent and self-contained. This entails 
stating and proving a large number of theorems, propositions, lemmas, and 
corollaries. The benefit of such an approach is that the student will recog- 
nize clearly both the power and the limitation of a mathematical idea used 
in physics. There is a tendency on the part of the novice to universalize the 
mathematical methods and ideas encountered in physics courses because the 
limitations of these methods and ideas are not clearly pointed out. 

There is a great deal of freedom in the topics and the level of presentation 
that instructors can choose from this book. My experience has shown that 
Parts I, Il, UI, Chap. 12, selected sections of Chap. 13, and selected sections 
or examples of Chap. 19 (or a large subset of all this) will be a reasonable 
course content for advanced undergraduates. If one adds Chaps. 14 and 20, 
as well as selected topics from Chaps. 21 and 22, one can design a course 
suitable for first-year graduate students. By judicious choice of topics from 
Parts VII and VII, the instructor can bring the content of the course to a 
more modern setting. Depending on the sophistication of the students, this 
can be done either in the first year or the second year of graduate school. 


Features 


To better understand theorems, propositions, and so forth, students need to 
see them in action. There are over 350 worked-out examples and over 850 
problems (many with detailed hints) in this book, providing a vast arena in 
which students can watch the formalism unfold. The philosophy underly- 
ing this abundance can be summarized as “An example is worth a thousand 
words of explanation.” Thus, whenever a statement is intrinsically vague or 


Preface to First Edition 


hard to grasp, worked-out examples and/or problems with hints are provided 
to clarify it. The inclusion of such a large number of examples is the means 
by which the balance between formalism and application has been achieved. 
However, although applications are essential in understanding mathemati- 
cal physics, they are only one side of the coin. The theorems, propositions, 
lemmas, and corollaries, being highly condensed versions of knowledge, are 
equally important. 

A conspicuous feature of the book, which is not emphasized in other 
comparable books, is the attempt to exhibit—as much as it is useful and 
applicable—interrelationships among various topics covered. Thus, the un- 
derlying theme of a vector space (which, in my opinion, is the most primitive 
concept at this level of presentation) recurs throughout the book and alerts 
the reader to the connection between various seemingly unrelated topics. 

Another useful feature is the presentation of the historical setting in 
which men and women of mathematics and physics worked. I have gone 
against the trend of the “ahistoricism” of mathematicians and physicists by 
summarizing the life stories of the people behind the ideas. Many a time, 
the anecdotes and the historical circumstances in which a mathematical or 
physical idea takes form can go a long way toward helping us understand 
and appreciate the idea, especially if the interaction among—and the contri- 
butions of—all those having a share in the creation of the idea is pointed out, 
and the historical continuity of the development of the idea is emphasized. 

To facilitate reference to them, all mathematical statements (definitions, 
theorems, propositions, lemmas, corollaries, and examples) have been num- 
bered consecutively within each section and are preceded by the section 
number. For example, 4.2.9 Definition indicates the ninth mathematical 
statement (which happens to be a definition) in Sect. 4.2. The end of a proof 
is marked by an empty square (, and that of an example by a filled square Mf, 
placed at the right margin of each. 

Finally, a comprehensive index, a large number of marginal notes, and 
many explanatory underbraced and overbraced comments in equations fa- 
cilitate the use and comprehension of the book. In this respect, the book is 
also useful as a reference. 


Organization and Topical Coverage 


Aside from Chap. 0, which is a collection of purely mathematical concepts, 
the book is divided into eight parts. Part I, consisting of the first four chap- 
ters, is devoted to a thorough study of finite-dimensional vector spaces and 
linear operators defined on them. As the unifying theme of the book, vector 
spaces demand careful analysis, and Part I provides this in the more accessi- 
ble setting of finite dimension in a language that is conveniently generalized 
to the more relevant infinite dimensions, the subject of the next part. 

Following a brief discussion of the technical difficulties associated with 
infinity, Part IL is devoted to the two main infinite-dimensional vector spaces 
of mathematical physics: the classical orthogonal polynomials, and Fourier 
series and transform. 
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Complex variables appear in Part III. Chapter 9 deals with basic proper- 
ties of complex functions, complex series, and their convergence. Chapter 10 
discusses the calculus of residues and its application to the evaluation of def- 
inite integrals. Chapter 11 deals with more advanced topics such as multi- 
valued functions, analytic continuation, and the method of steepest descent. 

Part IV treats mainly ordinary differential equations. Chapter 12 shows 
how ordinary differential equations of second order arise in physical prob- 
lems, and Chap. 13 consists of a formal discussion of these differential equa- 
tions as well as methods of solving them numerically. Chapter 14 brings in 
the power of complex analysis to a treatment of the hypergeometric dif- 
ferential equation. The last chapter of this part deals with the solution of 
differential equations using integral transforms. 

Part V starts with a formal chapter on the theory of operator and their 
spectral decomposition in Chap. 16. Chapter 17 focuses on a specific type 
of operator, namely the integral operators and their corresponding integral 
equations. The formalism and applications of Sturm-Liouville theory appear 
in Chaps. 18 and 19, respectively. 

The entire Part VI is devoted to a discussion of Green’s functions. Chap- 
ter 20 introduces these functions for ordinary differential equations, while 
Chaps. 21 and 22 discuss the Green’s functions in an m-dimensional Eu- 
clidean space. Some of the derivations in these last two chapters are new 
and, as far as I know, unavailable anywhere else. 

Parts VII and VIII contain a thorough discussion of Lie groups and their 
applications. The concept of group is introduced in Chap. 23. The theory of 
group representation, with an eye on its application in quantum mechanics, 
is discussed in the next chapter. Chapters 25 and 26 concentrate on tensor 
algebra and tensor analysis on manifolds. In Part VIII, the concepts of group 
and manifold are brought together in the context of Lie groups. Chapter 27 
discusses Lie groups and their algebras as well as their representations, with 
special emphasis on their application in physics. Chapter 28 is on differential 
geometry including a brief introduction to general relativity. Lie’s original 
motivation for constructing the groups that bear his name is discussed in 
Chap. 29 in the context of a systematic treatment of differential equations 
using their symmetry groups. The book ends in a chapter that blends many of 
the ideas developed throughout the previous parts in order to treat variational 
problems and their symmetries. It also provides a most fitting example of the 
claim made at the beginning of this preface and one of the most beautiful 
results of mathematical physics: Noether’s theorem on the relation between 
symmetries and conservation laws. 
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It is my pleasure to thank all those readers who pointed out typograph- 
ical mistakes and suggested a few clarifying changes. With the exception 
of a couple that required substantial revision, I have incorporated all the 
corrections and suggestions in this second printing. 


Note to the Reader 


Mathematics and physics are like the game of chess (or, for that matter, like 
any game)—you will learn only by “playing” them. No amount of reading 
about the game will make you a master. In this book you will find a large 
number of examples and problems. Go through as many examples as pos- 
sible, and try to reproduce them. Pay particular attention to sentences like 
“The reader may check ...” or “It is straightforward to show ...”. These 
are red flags warning you that for a good understanding of the material 
at hand, you need to provide the missing steps. The problems often fill in 
missing steps as well; and in this respect they are essential for a thorough 
understanding of the book. Do not get discouraged if you cannot get to the 
solution of a problem at your first attempt. If you start from the beginning 
and think about each problem hard enough, you will get to the solution, and 
you will see that the subsequent problems will not be as difficult. 

The extensive index makes the specific topics about which you may be 
interested to learn easily accessible. Often the marginal notes will help you 
easily locate the index entry you are after. 

I have included a large collection of biographical sketches of mathemat- 
ical physicists of the past. These are truly inspiring stories, and I encourage 
you to read them. They let you see that even under excruciating circum- 
stances, the human mind can work miracles. You will discover how these 
remarkable individuals overcame the political, social, and economic condi- 
tions of their time to let us get a faint glimpse of the truth. They are our true 
heroes. 
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Mathematical Preliminaries 


This introductory chapter gathers together some of the most basic tools and 
notions that are used throughout the book. It also introduces some common 
vocabulary and notations used in modern mathematical physics literature. 
Readers familiar with such concepts as sets, maps, equivalence relations, 
and metric spaces may wish to skip this chapter. 


1.1 Sets 


Modern mathematics starts with the basic (and undefinable) concept of set. 
We think of a set as a structureless family, or collection, of objects. We 
speak, for example, of the set of students in a college, of men in a city, of 
women working for a corporation, of vectors in space, of points in a plane, 
or of events in the continuum of space-time. Each member a of a set A is 
called an element of that set. This relation is denoted by a € A (read “a is an 
element of A” or “a belongs to A” ), and its negation by a ¢ A. Sometimes 
a is called a point of the set A to emphasize a geometric connotation. 

A set is usually designated by enumeration of its elements between 
braces. For example, {2,4, 6,8} represents the set consisting of the first 
four even natural numbers; {0, +1, +2, +3,...} is the set of all integers; 
2 x3,... } is the set of all nonnegative powers of x; and {1,7, —1, —i} 
is the set of the four complex fourth roots of unity. In many cases, a set is 
defined by a (mathematical) statement that holds for all of its elements. Such 
a set is generally denoted by {x | P(x)} and read “the set of all x’s such that 
P(x) is true.” The foregoing examples of sets can be written alternatively as 
follows: 


{l,x,x 


{n |n is even and 1 <n < 9} 
{n | n is a natural number} 
{y | y =x" and nis a natural number} 


{z |c+=1andzisa complex number}. 
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concept of set 
elaborated 


singleton 


(proper) subset 
empty set 


union, intersection, 
complement 


universal set 


Cartesian product 
ordered pairs 
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In a frequently used shorthand notation, the last two sets can be abbrevi- 
ated as {x” | n > 0 and n is an integer} and {z € C | z+ = 1}. Similarly, the 
unit circle can be denoted by {z € C | |z| = 1}, the closed interval [a, b] as 
{x |a <x <b}, the open interval (a,b) as {x |a <x <b}, and the set of 
all nonnegative powers of x as {x”}°° 9 or {x”}nen, where N is the set of 
natural numbers (i.e., nonnegative integers). This last notation will be used 
frequently in this book. A set with a single element is called a singleton. 

Ifa € A whenever a € B, we say that B is asubset of A and write BC A 
or AD B.If BCAand ACB, then A= B.If BCA and AFB, then B 
is called a proper subset of A. The set defined by {a | a 4 a} is called the 
empty set and is denoted by Y. Clearly, 4 contains no elements and is a 
subset of any arbitrary set. The collection of all subsets (including 4) of a 
set A is denoted by 24. The reason for this notation is that the number of 
subsets of a set containing n elements is 2” when n is finite (Problem 1.1). 

If A and B are sets, their union, denoted by A U B, is the set containing 
all elements that belong to A or B or both. The intersection of the sets A 
and B, denoted by AN B, is the set containing all elements belonging to 
both A and B. If {By}ve; is a collection of sets,! we denote their union by 
Ue Ba and their intersection by (),<; Bu- 

The complement of a set A is denoted by ~A and defined as 


~A={ala¢ A}. 
The complement of B in A (or their difference) is 
A~B={a|aeAanda ¢ B}. 


In any application of set theory there is an underlying universal set 
whose subsets are the objects of study. This universal set is usually clear 
from the context. For example, in the study of the properties of integers, the 
set of integers, denoted by Z, is the universal set. The set of reals, R, is the 
universal set in real analysis, and the set of complex numbers, C, is the uni- 
versal set in complex analysis. To emphasize the presence of a universal set 
X, one can write X ~ A instead of ~A. 

From two given sets A and B, it is possible to form the Cartesian prod- 
uct of A and B, denoted by A x B, which is the set of ordered pairs (a, b), 
where a € A and be B. This is expressed in set-theoretic notation as 


Ax B={(a,b)|aeAandbe B}. 


We can generalize this to an arbitrary number of sets. If A;, A2,..., An are 
sets, then the Cartesian product of these sets is 


A, X Ay x +++ x An = {(a1,a2,...,4n) | aj € Aj}, 


‘Here J is an index set—or a counting set—with its typical element denoted by a. In 
most cases, / is the set of (nonnegative) integers, but, in principle, it can be any set, for 
example, the set of real numbers. 


1.1. Sets 


which is a set of ordered n-tuples. If Ay = Az =--- = A, = A, then we 
write A” instead of A x A x --: x A, and 


A’ = {(a1,42,...54n) | ai € A}. 


The most familiar example of a Cartesian product occurs when A = R. 
Then R? is the set of pairs (x1, x2) with x;, x2 € R. This is simply the points 
in the plane. Similarly, R? is the set of triplets (x), x2, x3), or the points in 
space, and R” = {(x1, x2,...,Xn)|x; € R} is the set of real n-tuples. 


1.1.1 Equivalence Relations 


There are many instances in which the elements of a set are naturally 
grouped together. For example, all vector potentials that differ by the gra- 
dient of a scalar function can be grouped together because they all give the 
same magnetic field. Similarly, all quantum state functions (of unit “length”) 
that differ by a multiplicative complex number of unit length can be grouped 
together because they all represent the same physical state. The abstraction 
of these ideas is summarized in the following definition. 


Definition 1.1.1 Let A be a set. A relation on A is a comparison test be- 
tween members of ordered pairs of elements of A. If the pair (a,b) € Ax A 
passes this test, we write a > b and read “a is related to b”. An equivalence 
relation on A is a relation that has the following properties: 


aca VWaéeA, (reflexivity) 
arb=>brva a,beA, (symmetry) 
arb, andb>cs>arvc a,b,céA, (transivity). 


When av b, we say that “a is equivalent to b”. The set [a] = {be A | bea} 
of all elements that are equivalent to a is called the equivalence class of a. 


The reader may verify the following property of equivalence relations. 


Proposition 1.1.2 If > is an equivalence relation on A and a,b e€ A, then 
either [a] N [b] = @ or [a] = [d]. 


Therefore, a’ € [a] implies that [a’] = [a]. In other words, any element 
of an equivalence class can be chosen to be a representative of that class. 
Because of the symmetry of equivalence relations, sometimes we denote 
them by pa. 


Example 1.1.3 Let A be the set of human beings. Let a > b be interpreted 
as “a is older than b.” Then clearly, > is a relation but not an equivalence 
relation. On the other hand, if we interpret a > b as “a and b live in the 
same city,” then > is an equivalence relation, as the reader may check. The 
equivalence class of a is the population of that city. 


relation and equivalence 
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equivalence class 
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partition of a set 
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Let V be the set of vector potentials. Write A> A’ if A— A’= Vf for 
some function f. The reader may verify that > is an equivalence relation, 
and that [A] is the set of all vector potentials giving rise to the same mag- 
netic field. 

Let the underlying set be Z x (Z ~ {0}). Say “(a, b) is related to (c, d)” if 
ad = bc. Then this relation is an equivalence relation. Furthermore, [(a, b)] 
can be identified as the ratio a/b. 


Definition 1.1.4 Let A be a set and {By} a collection of subsets of A. We 
say that {B,} is a partition of A, or {B,} partitions A, if the By’s are 
disjoint, i.e., have no element in common, and Us By =A. 


Now consider the collection {[a] | a € A} of all equivalence classes of A. 
These classes are disjoint, and evidently their union covers all of A. There- 
fore, the collection of equivalence classes of A is a partition of A. This 
collection is denoted by A/< and is called the quotient set or factor set of 
A under the equivalence relation px. 


Example 1.1.5 Let the underlying set be R*. Define an equivalence relation 
on R? by saying that P; € R? and P) € R° are equivalent if they lie on the 
same line passing through the origin. Then R?/ c< is the set of all lines in 
space passing through the origin. If we choose the unit vector with positive 
third coordinate along a given line as the representative of that line, then 
IR3/ >, called the projective space associated with R*, is almost (but not 
quite) the same as the upper unit hemisphere. The difference is that any two 
points on the edge of the hemisphere which lie on the same diameter ought 
to be identified as the same to turn it into the projective space. 

On the set Z of integers, define a relation by writing mp n for m,n € Z 
if m — n is divisible by k, where k is a fixed integer. Then > is not only a 
relation, but an equivalence relation. In this case, we have 


Z/o = {{O], [1]... fe - 1}, 


as the reader is urged to verify. 

For the equivalence relation defined on Z x (Z ~ {0}) of Example 1.1.3, 
the set (Z x (Z ~ {0}))/ ex can be identified with Q, the set of rational 
numbers. 


1.2 Maps 


To communicate between sets, one introduces the concept of a map. A map 


f from a set X to a set Y, denoted by f: X — Y ox Y, is a corre- 
spondence between elements of X and those of Y in which all the elements 
of X participate, and each element of X corresponds to only one element of 
Y (see Fig. 1.1). If y € Y is the element that corresponds to x € X via the 
map f, we write 


y=f(x) or xh f(x) or xesy 


1.2. Maps 


Fig. 1.1 The map f maps all of the set X onto a subset of Y. The shaded area in Y is 
F(X), the range of f 


and call f(x) the image of x under f. Thus, by the definition of map, x € 
X can have only one image. The set X is called the domain, and Y the 
codomain or the target space. Two maps f : X —> Y and g: X > Y are 
said to be equal if f(x) = g(x) forallx eX. 


Definition 1.2.1 A map whose codomain is the set of real numbers R or the 
set of complex numbers C is commonly called a function. 


A special map that applies to all sets A is id4 : A > A, called the identity 
map of A, and defined by 


idg(a)=a VWaeaA. 
The graph I"; of amap f : A — B isa subset of A x B defined by 
Tr ={(a, f@)|aeA}CAxB. 


This general definition reduces to the ordinary graphs encountered in alge- 
bra and calculus where A = B = R and A x B is the xy-plane. 

If A is a subset of X, we call f(A) = {f(x) | x € A} the image 
of A. Similarly, if B C f(X), we call fl (B) = {x eX | f(x) € B} the 
inverse image, or preimage, of B. In words, f~!(B) consists of all ele- 
ments in X whose images are in B C Y. If B consists of a single element b, 
then ff 'O) = {x € X | f(x) = 5} consists of all elements of X that are 
mapped to b. Note that it is possible for many points of X to have the same 
image in Y. The subset f(X) of the codomain of a map f/f is called the range 
of f (see Fig. 1.1). 

If f:X — Y and g: Y > W, then the mapping h : X — W given by 
h(x) = g(f(x)) is called the composition of f and g, and is denoted by 
h=go f (see Fig. 1.2).” It is easy to verify that 


foidy = f =idy of. 


If f(x) = f(x2) implies that x; = x2, we call f injective, or one-to- 
one (denoted 1-1). For an injective map only one element of X corresponds 
to an element of Y. If f(X) = Y, the mapping is said to be surjective, or 


Note the importance of the order in which the composition is written. The reverse order 
may not even exist. 
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Fig. 1.2 The composition of two maps is another map 


onto. A map that is both injective and surjective is said to be bijective, or 
to be a one-to-one correspondence. Two sets that are in one-to-one corre- 
spondence, have, by definition, the same number of elements. If f : X — Y 
is a bijection from X onto Y, then for each y € Y there is one and only one 
element x in X for which f(x) = y. Thus, there is a mapping f~!: Y > X 
given by f—!(y) =x, where x is the unique element such that f(x) = y. 
This mapping is called the inverse of f. The inverse of f is also identified 
as the map that satisfies f o f~! =idy and f~! o f =idy. For example, 
one can easily verify that In~! = exp and exp~! = In, because In(e*) = x 
and e™* — x, 

Givenamap f : X — Y, wecan define a relation >< on X by saying x; > 
x2 if f (x1) = f (x2). The reader may check that this is in fact an equivalence 
relation. The equivalence classes are subsets of X all of whose elements 
map to the same point in Y. In fact, [x] = f~!(f(x)). Corresponding to 
f, there is a map f : X/bs— Y, called quotient map or factor map, given 
by f([x]) = f(x). This map is injective because if f ([x1]) = f ([x2]), then 
f (x1) = f (x2), so x1 and x2 belong to the same equivalence class; therefore, 
[1] = [x2]. It follows that 


Proposition 1.2.2 The map f : X/>— f (X) is bijective. 

If f and g are both bijections with inverses f~! and g~! 
then go f also has an inverse, and verifying that (go f)~! = f 
straightforward. 


, respectively, 


log lis 


Example 1.2.3 As an example of the preimage of a set, consider the sine 
and cosine functions: sin: R — R and cos: R— R. Then it should be clear 
that 


[o,@) 
4 
sin”! 0 = {n}._5 cos '0= {Fn | 
2 n=—OoO 
Similarly, sin~![0, 5]. the preimage of the closed interval [0, 51 CR, con- 
sists of all the intervals on the x-axis marked by heavy line segments in 
Fig. 1.3, i.e., all the points whose sine lies between 0 and 7 


Example 1.2.4 Let X be any set on which an equivalence relation >< is 
defined. Then there is a natural map 7, called projection z : X > X/vx 


1.2 Maps 


-1 


Fig. 1.3 The union of all the intervals on the x-axis marked by heavy line segments is 
lrg 1 
sin” “[0, 5] 


given by (x) = [x]. This map is obviously surjective, but not injective, as 
m(y) = 2 (x) if yoax. It becomes injective only if the equivalence relation 
becomes the identity map: > = idy. Then the map becomes bijective, and 
we write X = X/idy. 


Example 1.2.5 As further examples of maps, we consider functions f : 
R — R studied in calculus. The two functions f :R— R and g:R-> 
(—1, +1) given, respectively, by f(x) = x? and g(x) = tanhx are bijective. 
The latter function, by the way, shows that there are as many points in the 
whole real line as there are in the interval (—1,+1). If we denote the set 
of positive real numbers by R™, then the function f : R — RT given by 
f (x) = x? is surjective but not injective (both x and —x map to x7). The 
function g : R+ — R given by the same rule, g(x) = x’, is injective but 
not surjective. On the other hand, h : R+ — Rt again given by h(x) = x? 
is bijective, but u : R > R given by the same rule is neither injective nor 
surjective. 

Let M”*" denote the set of n x n real matrices. Define a function det : 
M"*" —> R by det(A) = det A. This function is clearly surjective (why?) but 
not injective. The set of all matrices whose determinant is | is det~!(1). 
Such matrices occur frequently in physical applications. 

Another example of interest is f :C — R given by f(z) = |z|. This func- 
tion is also neither injective nor surjective. Here f—!(1) is the unit circle, 
the circle of radius 1 in the complex plane. It is clear that f(C) = {0} UR™. 
Furthermore, f induces an equivalence relation on C: z; >< Z2 if z; and z2 
belong to the same circle. Then C/ p< is the set of circles centered at the ori- 
gin of the complex plane and f : C/>s1—> {0} UR? is bijective, associating 
each circle to its radius. 


The domain of a map can be a Cartesian product of a set, as in f : X x 
X — Y.Two specific cases are worthy of mention. The first is when Y = R. 
An example of this case is the dot product on vectors. Thus, if X is the set 
of vectors in space, we can define f(a, b) =a- b. The second case is when 
Y = X. Then ff is called a binary operation on X, whereby an element in 
X is associated with two elements in X. For instance, let X = Z, the set of 
all integers; then the function f :Z x Z— Z defined by f(m,n) = mn is 
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the binary operation of multiplication of integers. Similarly, g: Rx R—R 
given by g(x, y) = x+y is the binary operation of addition of real numbers. 


1.3. Metric Spaces 


Although sets are at the root of modern mathematics, by themselves they 
are only of formal and abstract interest. To make sets useful, it is necessary 
to introduce some structures on them. There are two general procedures for 
the implementation of such structures. These are the abstractions of the two 
major branches of mathematics—algebra and analysis. 

We can turn a set into an algebraic structure by introducing a binary op- 
eration on it. For example, a vector space consists, among other things, of 
the binary operation of vector addition. A group is, among other things, a 
set together with the binary operation of “multiplication”. There are many 
other examples of algebraic systems, and they constitute the rich subject of 
algebra. 

When analysis, the other branch of mathematics, is abstracted using the 
concept of sets, it leads to topology, in which the concept of continuity plays 
a central role. This is also a rich subject with far-reaching implications and 
applications. We shall not go into any details of these two areas of math- 
ematics. Although some algebraic systems will be discussed and the ideas 
of limit and continuity will be used in the sequel, this will be done in an 
intuitive fashion, by introducing and employing the concepts when they are 
needed. On the other hand, some general concepts will be introduced when 
they require minimum prerequisites. One of these is a metric space: 


Definition 1.3.1 A metric space is a set X together with a real-valued func- 
tion d: X x X — R such that 


(a) d(x,y)>0Vx, y,andd(x, y)=Oiffx=y. 
(b) d(x, y)=d(y,x) (symmetry). 
(c) d(x, y)<d(x,z)+d(z, y) (the triangle inequality). 


It is worthwhile to point out that X is a completely arbitrary set and 
needs no other structure. In this respect, Definition 1.3.1 is very broad and 
encompasses many different situations, as the following examples will show. 
Before examining the examples, note that the function d defined above is the 
abstraction of the notion of distance: (a) says that the distance between any 
two points is always nonnegative and is zero only if the two points coincide; 
(b) says that the distance between two points does not change if the two 
points are interchanged; (c) states the known fact that the sum of the lengths 
of two sides of a triangle is always greater than or equal to the length of the 
third side. 

The fact that the distance between two points of a set is positive and real 
is a property of a Euclidean metric space. In relativity, on the other hand, 
one has to deal with the possibility of a Minkowskian metric space for which 
distance (squared) is negative. 


1.3. Metric Spaces 


Example 1.3.2 Here are some examples of metric spaces: 


1. Let X =Q, the set of rational numbers, and define d(x, y) = |x — y|. 
Let X = R, and again define d(x, y) = |x — yl]. 

3. Let X consist of the points on the surface of a sphere. We can define 
two distance functions on X. Let dj(P, Q) be the length of the chord 
joining P and Q on the sphere. We can also define another metric, 
d2(P, Q), as the length of the arc of the great circle passing through 
points P and Q on the surface of the sphere. It is not hard to convince 
oneself that d; and d satisfy all the properties of a metric function. 
Note that for d2, if two of the three points are the poles of the sphere, 
then the triangle inequality becomes an equality. 

4. Let @°[a, b] denote the set of continuous real-valued functions on the 
closed interval [a, b]. We can define d(f, g) = iM | f (x) — g(x)| dx for 
f,g € C%(a, b). 

5. Let Cg(a,b) denote the set of bounded continuous real-valued func- 
tions on the closed interval [a, b]. We then define 


af.2)= max {| FG) —g(x)|} for f,g € Ca(a,b). 


This notation says: Take the absolute value of the difference in f and 
g at all x in the interval [a, b] and then pick the maximum of all these 
values. 


The metric function creates a natural setting in which to test the “close- 
ness” of points in a metric space. One occasion on which the idea of close- 
ness becomes essential is in the study of a sequence. A sequence is a map- 
ping s : N— X from the set of natural numbers N into the metric space X. 
Such a mapping associates with a positive integer n a point s(n) of the met- 
ric space X. It is customary to write s, (or x, to match the symbol X) instead 
of s(n) and to enumerate the values of the function by writing {xn }PC,. 

Knowledge of the behavior of a sequence for large values of n is of funda- 
mental importance. In particular, it is important to know whether a sequence 
approaches a finite value as n increases. 


Definition 1.3.3 Suppose that for some x and for any positive real num- 
ber €, there exists a natural number N such that d(x,,x) < € whenever 
n> N. Then we say that the sequence {x,}°° , converges to x and write 
limy—+o0 d(Xn, X) = 0 or d(x, x) > 0 or simply x, > x. 


It may not be possible to test directly for the convergence of a given 
sequence because this requires a knowledge of the limit point x. However, 
it is possible to do the next best thing—to see whether the points of the 
sequence get closer and closer as n gets larger and larger. 


Definition 1.3.4 A Cauchy sequence is a sequence for which 


lim d(xm, Xn) =0. 
m,n—-> oo 
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Fig. 1.4 The distance between the elements of a Cauchy sequence gets smaller and 
smaller 


Figure 1.4 shows a Cauchy sequence. 

We can test directly whether or not a sequence is Cauchy. However, the 
fact that a sequence is Cauchy does not guarantee that it converges. For 
example, let the metric space be the set of rational numbers Q with the 
metric function d(x, y) = |x — y|, and consider the sequence {x,}°° , where 
Xn = p_-,(—D**!/k. It is clear that x, is a rational number for any 7. 
Problem 1.7 shows how to prove that |x, — x,| — 0. Thus, the sequence is 
Cauchy. However, it is probably known to the reader that limy_, oo x, = In2, 
which is not a rational number. 


Definition 1.3.5 A metric space in which every Cauchy sequence con- 
verges is called a complete metric space. 


Complete metric spaces play a crucial role in modern analysis. The pre- 
ceding example shows that Q is not a complete metric space. However, if 
the limit points of all Cauchy sequences are added to Q, the resulting space 
becomes complete. This complete space is, of course, the real number sys- 
tem R. It turns out that any incomplete metric space can be “enlarged” to a 
complete metric space. 


1.4 Cardinality 


The process of counting is a one-to-one comparison of one set with another. 
If two sets are in one-to-one correspondence, they are said to have the same 
cardinality. Two sets with the same cardinality essentially have the same 
“number” of elements. The set F,, = {1,2,...,} is finite and has cardinal- 
ity n. Any set from which there is a bijection onto F;, is said to be finite with 
n elements. 


Historical Notes 

Although some steps had been taken before him in the direction of a definitive theory of 
sets, the creator of the theory of sets is considered to be Georg Cantor (1845-1918), who 
was born in Russia of Danish-Jewish parentage but moved to Germany with his parents. 
His father urged him to study engineering, and Cantor entered the University of Berlin in 
1863 with that intention. There he came under the influence of Weierstrass and turned to 


1.4 Cardinality 


pure mathematics. He became Privatdozent at Halle in 1869 and professor in 1879. When 
he was twenty-nine he published his first revolutionary paper on the theory of infinite sets 
in the Journal fiir Mathematik. Although some of its propositions were deemed faulty 
by the older mathematicians, its overall originality and brilliance attracted attention. He 
continued to publish papers on the theory of sets and on transfinite numbers until 1897. 
One of Cantor’s main concerns was to differentiate among infinite sets by “size” and, 
likeBolzano before him, he decided that one-to-one correspondence should be the basic 
principle. In his correspondence with Dedekind in 1873, Cantor posed the question of 
whether the set of real numbers can be put into one-to-one correspondence with the inte- 
gers, and some weeks later he answered in the negative. He gave two proofs. The first is 
more complicated than the second, which is the one most often used today. In 1874 Can- 
tor occupied himself with the equivalence of the points of a line and the points of IR” and 
sought to prove that a one-to-one correspondence between these two sets was impossible. 
Three years later he proved that there is such a correspondence. He wrote to Dedekind, 
“T see it but I do not believe it.” He later showed that given any set, it is always possible 
to create a new set, the set of subsets of the given set, whose cardinal number is larger 
than that of the given set. For the natural numbers N, whose cardinality is denoted by Xo, 
the cardinal number of the set of subsets is denoted by 2*°. Cantor proved that 2°0 = c, 
where c is the cardinal number of the continuum; i.e., the set of real numbers. 

Cantor’s work, which resolved age-old problems and reversed much previous thought, 
could hardly be expected to receive immediate acceptance. His ideas on transfinite ordi- 
nal and cardinal numbers aroused the hostility of the powerful Leopold Kronecker, who 
attacked Cantor’s theory savagely over more than a decade, repeatedly preventing Can- 
tor from obtaining a more prominent appointment in Berlin. Though Kronecker died in 
1891, his attacks left mathematicians suspicious of Cantor’s work. Poincaré referred to 
set theory as an interesting “pathological case.” He also predicted that “Later generations 
will regard [Cantor’s] Mengenlehre as a disease from which one has recovered.” At one 
time Cantor suffered a nervous breakdown, but resumed work in 1887. 

Many prominent mathematicians, however, were impressed by the uses to which the new 
theory had already been put in analysis, measure theory, and topology. Hilbert spread 
Cantor’s ideas in Germany, and in 1926 said, “No one shall expel us from the paradise 
which Cantor created for us.” He praised Cantor’s transfinite arithmetic as “the most as- 
tonishing product of mathematical thought, one of the most beautiful realizations of hu- 
man activity in the domain of the purely intelligible.” Bertrand Russell described Cantor’s 
work as “probably the greatest of which the age can boast.” The subsequent utility of Can- 
tor’s work in formalizing mathematics—a movement largely led by Hilbert—seems at 
odds with Cantor’s Platonic view that the greater importance of his work was in its impli- 
cations for metaphysics and theology. That his work could be so seamlessly diverted from 
the goals intended by its creator is strong testimony to its objectivity and craftsmanship. 


Now consider the set of natural numbers N = {1, 2,3, ...}. If there exists 
a bijection between a set A and N, then A is said to be countably infinite. 
Some examples of countably infinite sets are the set of all integers, the set 
of even natural numbers, the set of odd natural numbers, the set of all prime 
numbers, and the set of energy levels of the bound states of a hydrogen atom. 

It may seem surprising that a subset (such as the set of all even numbers) 
can be put into one-to-one correspondence with the full set (the set of all 
natural numbers); however, this is a property shared by all infinite sets. In 
fact, sometimes infinite sets are defined as those sets that are in one-to-one 
correspondence with at least one of their proper subsets. It is also astonish- 
ing to discover that there are as many rational numbers as there are natural 
numbers. After all, there are infinitely many rational numbers just in the 
interval (0, 1)—or between any two distinct real numbers!? 


3The proof involves writing m/n as the mnth entry in an oo x oo matrix and starting 
the “count” with the (1, 1) entry, going to the right to (1, 2), then diagonally to (2, 1), 
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Fig. 1.5 The Cantor set after one, two, three, and four “dissections” 


Sets that are neither finite nor countably infinite are said to be uncount- 
able. In some sense they are “more infinite” than any countable set. Ex- 
amples of uncountable sets are the points in the interval (—1, +1), the real 
numbers, the points in a plane, and the points in space. It can be shown 
that these sets have the same cardinality: There are as many points in 
three-dimensional space—the whole universe—as there are in the interval 
(—1, +1) or in any other finite interval. 

Cardinality is a very intricate mathematical notion with many surprising 
results. Consider the interval [0, 1]. Remove the open interval (, 5) from 
its middle (leaving the points 5 and behind). From the remaining portion, 


[0 d ]U [;. 1], remove the two middle thirds; the remaining portion will then 


ostels- slob s}ule 


be 

(see Fig. 1.5). Do this indefinitely. What is the cardinality of the remaining 
set, which is called the Cantor set? Intuitively we expect hardly anything to 
be left. We might persuade ourselves into accepting the fact that the number 
of points remaining is at most infinite but countable. The surprising fact is 
that the cardinality is that of the continuum! Thus, after removal of infinitely 
many middle thirds, the set that remains has as many points as the original 
set! 


1.5 Mathematical Induction 


Many a time it is desirable to make a mathematical statement that is true 
for all natural numbers. For example, we may want to establish a formula 
involving an integer parameter that will hold for all positive integers. One 
encounters this situation when, after experimenting with the first few posi- 
tive integers, one recognizes a pattern and discovers a formula, and wants to 
make sure that the formula holds for all natural numbers. For this purpose, 
one uses mathematical induction. The essence of mathematical induction 
is stated as follows: 


then down to (3, 1), then diagonally up, and so on. Obviously the set is countable and it 
exhausts all rational numbers. In fact, the process double counts some of the entries. 


1.5 Mathematical Induction 


Proposition 1.5.1 Suppose that there is associated with each natural num- 
ber (positive integer) n a statement Sy. Then Sy is true for every positive 
integer provided the following two conditions hold: 


1. Sy is true. 
2. If Sm is true for some given positive integer m, then S41 is also true. 


Example 1.5.2 We illustrate the use of mathematical induction by proving 
the binomial theorem: 


m 


m 
m_ m m—kypk _ m!| m—k pk 
(a+b) => (i) b =) Fant b 


k=0 


(m — 1) 


— qm ona tye . a” 2p? 4..-4 mab”! + p™, 


(1.1) 


where we have used the notation 


m\ _ m! 12 
(=a 2 


The mathematical statement S,,, is Eq. (1.1). We note that S$; is trivially true: 
(a +b)! =a! +b!. Now we assume that Sj, is true and show that S41 is 
also true. This means starting with Eq. (1.1) and showing that 
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Then the induction principle ensures that the statement (equation) holds for 
all positive integers. Multiply both sides of Eq. (1.1) by a + 5 to obtain 
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Now separate the k = 0 term from the first sum and the k = m term from the 
second sum: 
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let k = j — 1 in this sum 
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The second sum in the last line involves j. Since this is a dummy index, 
we can substitute any symbol we please. The choice k is especially useful 
because then we can unite the two summations. This gives 
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If we now use 
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which the reader can easily verify, we finally obtain 
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This complete the proof. 


Mathematical induction is also used in defining quantities involving inte- 
gers. Such definitions are called inductive definitions. For example, induc- 


tive definition is used in defining powers: a! = a and a” =a"™~'a. 


1.6 Problems 


1.1 Show that the number of subsets of a set containing n elements is 2”. 


1.2 Let A, B, and C be sets in a universal set U. Show that 


(a) ACBand BCC implies ACC. 

(b) ACBiffANB=Aiff AUB=B. 
(c) ACBand BCC implies (AUB) CC. 
(d) AUB=(A~B)U(ANB)U(B~A). 


Hint: To show the equality of two sets, show that each set is a subset of the 
other. 


1.3 For eachn EN, let 
1 
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n 
Find U,, Jn and (7), In- 
1.4 Show that a’ € [a] implies that [a’] = [a]. 
1.5 Can you define a binary operation of “multiplication” on the set of vec- 


tors in space? What about vectors in the plane? In each case, write the com- 
ponents of the product in terms of the components of the two vectors. 


1.6 Problems 


1.6 Show that (f og)~!=g7!o f7~! when f and g are both bijections. 


1.7 We show that the sequence {x,,}°°.,, where x, = va) /k, is 


n=\? 
Cauchy. Without loss of generality, assume that n > m and n — m is even 
(the case of odd n — m can be handled similarly). 


(a) Show that 
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(b) Separate the even and odd parts of the sum and show that 
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(c) Add the two sums to obtain a single sum, showing that 
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(d) Convince yourself that ii f(x)dx > Yy_> f (k) for any continuous 
function f(x), and apply it to part (c) to get 
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Each term on the last line goes to zero independently as m and n go to 
infinity. 


1.8 Find a bijection f : N > Z. Hint: Find an f which maps even integers 
onto the positive integers and odd integers onto the negative integers. 


1.9 Take any two open intervals (a, b) and (c,d), and show that there are 
as many points in the first as there are in the second, regardless of the size of 
the intervals. Hint: Find a (linear) algebraic relation between points of the 
two intervals. 
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Leibnizrule 1.10 Use mathematical induction to derive the Leibniz rule for differenti- 
ating a product: 
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1.11 Use mathematical induction to derive the following results: 


n+l] _y 


n n 

tv n(n + 1) 
er r—-l1 >» 2 
k=0 k=0 


Part! 
Finite-Dimensional Vector Spaces 


Vectors and Linear Maps 2 


The familiar two- and three-dimensional vectors can easily be generalized 
to higher dimensions. Representing vectors by their components, one can 
conceive of vectors having N components. This is the most immediate gen- 
eralization of vectors in the plane and in space, and such vectors are called 
N-dimensional Cartesian vectors. Cartesian vectors are limited in two re- Cartesian vectors 
spects: Their components are real, and their dimensionality is finite. Some 
applications in physics require the removal of one or both of these limi- 
tations. It is therefore convenient to study vectors stripped of any dimen- 
sionality or reality of components. Such properties become consequences of 
more fundamental definitions. Although we will be concentrating on finite- 
dimensional vector spaces in this part of the book, many of the concepts and 
examples introduced here apply to infinite-dimensional spaces as well. 


2.1 Vector Spaces 
Let us begin with the definition of an abstract (complex) vector space.! 


Definition 2.1.1 A vector space V over C is a set of objects denoted by 
|a), |b), |x), and so on, called vectors, with the following properties: vector space defined 


1. To every pair of vectors |a) and |b) in V there corresponds a vector 

|a) + |b), also in V, called the sum of |a) and |b), such that 

(a) |a) +|b) = |b) + Ia), 

(b) la) + (1b) + |e)) = (la) + 1b)) + Ie), 

(c) There exists a unique vector |0) € V, called the zero vector, such 
that |a) + |0) = |a) for every vector |a), 

(d) Toevery vector |a) € V there corresponds a unique vector —|a) € 
V such that |a) + (—|a)) = |0). 


'Keep in mind that C is the set of complex numbers and R the set of reals. 
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2 Vectors and Linear Maps 


2. To every complex number” a—also called a scalar—and every vector 
|a) there corresponds a vector a@|a) in V such that 
(a) a(Bla)) = (aB)\a), 
(b) 1a) =|a). 
3. Multiplication involving vectors and scalars is distributive: 
(a) a(|a) + |b)) =ala) + ab). 
(b) (a+ )|a)=ala) + Bla). 


The bra, (|, and ket, |), notation for vectors, invented by Dirac, is very 
useful when dealing with complex vector spaces. However, it is somewhat 
clumsy for certain topics such as norm and metrics and will therefore be 
abandoned in those discussions. 

The vector space defined above is also called a complex vector space. It 
is possible to replace C with R—the set of real numbers—in which case the 
resulting space will be called a real vector space. 

Real and complex numbers are prototypes of a mathematical structure 
called field. A field F is a set of objects with two binary operations called ad- 
dition and multiplication. Multiplication distributes over addition, and each 
operation has an identity. The identity for addition is denoted by 0 and is 
called additive identity. The identity for multiplication is denoted by | and 
is called multiplicative identity. Furthermore, every element a € F has an 
additive inverse —a, and every element except the additive identity has a 


multiplicative inverse a~!. 


Example 2.1.2 (Some vector spaces) 


R is a vector space over the field of real numbers. 

C is a vector space over the field of real numbers. 

C is a vector space over the complex numbers. 

Let V = R and let the field of scalars be C. This is not a vector space, 
because property 2 of Definition 2.1.1 is not satisfied: A complex 


eee 


number times a real number is not a real number and therefore does 
not belong to V. 

5. The set of “arrows” in the plane (or in space) forms a vector space 
over R under the parallelogram law of addition of planar (or spatial) 
vectors. 

6. Let P*[t] be the set of all polynomials with complex coefficients in 
a variable t. Then P°[t] is a vector space under the ordinary addition 
of polynomials and the multiplication of a polynomial by a complex 
number. In this case the zero vector is the zero polynomial. 

7. Fora given positive integer n, let Py [t] be the set of all polynomials 
with complex coefficients of degree less than or equal to n. Again it 
is easy to verify that Py [t] is a vector space under the usual addition 


2Complex numbers, particularly when they are treated as variables, are usually denoted 
by z, and we shall adhere to this convention in Part III. However, in the discussion of 
vector spaces, we have found it more convenient to use lower case Greek letters to denote 
complex numbers as scalars. 


2.1 Vector Spaces 


of polynomials and their multiplication by complex scalars. In partic- 
ular, the sum of two polynomials of degree less than or equal to n is 
also a polynomial of degree less than or equal to n, and multiplying 
a polynomial with complex coefficients by a complex number gives 
another polynomial of the same type. Here the zero polynomial is the 
zero vector. 

8. The set P)[t] of polynomials of degree less than or equal to n with 
real coefficients is a vector space over the reals, but it is not a vector 
space over the complex numbers. 


9. Let C” consist of all complex n-tuples such as |a) = (a1, @2,..., Qn) 
and |b) = (61, B2,..-, Bn). Let n be a complex number. Then we de- 
fine 


|a) + |b) = (a1 + Bi, 2 + Bo, ...,Qn + Bn), 
nla) = (no, No2,..., An), 
|0) = (0,0, ..., 0), 


|a) = (—@1, —@2,..., —y). 


It is easy to verify that C” is a vector space over the complex numbers. 
It is called the n-dimensional complex coordinate space. 

10. The set of all real n-tuples R” is a vector space over the real num- 
bers under the operations similar to that of C”. It is called the n- 
dimensional real coordinate space, or Cartesian n-space. It is not 
a vector space over the complex numbers. 

11. The set of all complex matrices with m rows and n columns M’*" is 
a vector space under the usual addition of matrices and multiplication 
by complex numbers. The zero vector is the m x n matrix with all 
entries equal to zero. 

12. Let C™ be the set of all complex sequences |a) = {a;}?°, such that 
pyar |a;| < oo. One can show that with addition and scalar multipli- 
cation defined componentwise, C™ is a vector space over the complex 
numbers. 

13. The set of all complex-valued functions of a single real variable that 
are continuous in the real interval (a,b) is a vector space over the 
complex numbers. 

14. The set C” (a, b) of all real-valued functions of a single real variable 
defined on (a, b) that possess continuous derivatives of all orders up 
to n forms a vector space over the reals. 

15. The set C®(a, b) of all real-valued functions on (a, b) of a single real 
variable that possess derivatives of all orders forms a vector space over 
the reals. 


It is clear from the example above that a vector space depends as much 
on the nature of the vectors as on the nature of the scalars. 


Definition 2.1.3 The vectors |a;), |a2),..., |d,), are said to be linearly in- 
dependent if for a; € C, the relation ae , %|a;) = 0 implies a; = 0 for 
alli. The sum }~"_, aj|a;) is called a linear combination of {lai)}7_1- 
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2 Vectors and Linear Maps 
2.1.1 Subspaces 


Given a vector space V, one can consider a collection W of vectors in V, 
i.e., a subset of V. Because W is a subset, it contains vectors, but there is no 
guarantee that it contains the linear combination of those vectors. We now 
investigate conditions under which it does. 


Definition 2.1.4 A subspace W of a vector space V is a nonempty subset 
of V with the property that if |a), |b) € W, then a|a) + 6|b) also belongs to 
W for all a, B € C. 


The reader may verify that a subspace is a vector space in its own right, 
and that the intersection of two subspaces is also a subspace. 


Example 2.1.5 The following are subspaces of some of the vector spaces 
considered in Example 2.1.2. The reader is urged to verify the validity of 
each case. 


The “space” of real numbers is a subspace of C over the reals. 

R is not a subspace of C over the complex numbers, because as ex- 
plained in Example 2.1.2, R cannot be a vector space over the complex 
numbers. 

e The set of all vectors along a given line going through the origin is a 
subspace of arrows in the plane (or space) over R. 

Pelt] is a subspace of P°[r]. 

e C"! is a subspace of C” when C”™! is identified with all complex n- 
tuples with zero last entry. In general, C” is a subspace of C” form <n 
when C” is identified with all n-tuples whose last n — m elements are 
zero. 

e M’** is a subspace of M’”*”" for r <m and s <n. Here, we identify 
an r X s matrix with an m x n matrix whose last m — r rows and n — s 
columns are all zero. 

Prt] is a subspace of Pr [t] form <n. 
Py [ft] is a subspace of P) [t] for m <n. Note that both Pf [t] and PF [1] 
are vector spaces over the reals only. 

e RR” is a subspace of R” for m <n. Therefore, R7, the plane, is a sub- 
space of IR*, the Euclidean space. Also, R! = R is a subspace of both 
the plane R? and the Euclidean space R?. 

e Letabe along the x-axis (a subspace of R) and b along the y-axis (also 
a subspace of R*). Then a + b is neither along the x-axis nor along the 
y-axis. This shows that the union of two subspaces is not generally a 
subspace. 


Theorem 2.1.6 If S is any nonempty set of vectors in a vector space V, then 
the set Ws of all linear combinations of vectors in S is a subspace of V. We 
say that Ws is the span of S, or that S spans Ws, or that Ws is spanned 
by S. Ws is often denoted by Span{S}. 


The proof of Theorem 2.1.6 is left as Problem 2.6. 


2.1 Vector Spaces 


Definition 2.1.7 A basis of a vector space V is a set B of linearly inde- 
pendent vectors that spans all of V. A vector space that has a finite basis is 
called finite-dimensional; otherwise, it is infinite-dimensional. 


The definition of the dimensionality of a vector space based on a single 
basis makes sense because of the following theorem which we state without 
proof (see [Axle 96, page 31]): 


Theorem 2.1.8 All bases of a given finite-dimensional vector space have 
the same number of linearly independent vectors. 


Definition 2.1.9 The cardinality of a basis of a vector space V is called the 
dimension of V and denoted by dim V. To emphasize its dependence on the 
scalars, dimc V and dimp V are also used. A vector space of dimension N 
is sometimes denoted by Vy. 


If |a) is a vector in an N-dimensional vector space V and B = {lai}, a 
basis in that space, then by the definition of a basis, there exists a unique (see 
Problem 2.4) set of scalars {a@1,@2,...,@,} such that |a) = yy ailai). 
The set {aj} , is called the components of |a) in the basis B. 

Example 2.1.10 The following are bases for the vector spaces given in Ex- 
ample 2.1.2. 


e The number | (or any nonzero real number) is a basis for R, which is 
therefore one-dimensional. 

e The numbers | and i = /—I (or any pair of distinct nonzero complex 
numbers) are basis vectors for the vector space C over R. Thus, this 
space is two-dimensional. 

e The number 1 (or any nonzero complex number) is a basis for C over 
C, and the space is one-dimensional. Note that although the vectors are 
the same as in the preceding item, changing the nature of the scalars 
changes the dimensionality of the space. 

e The set {é,, é,, €,} of the unit vectors in the directions of the three axes 
forms a basis in space. The space is three-dimensional. 


e A basis of P°[t] can be formed by the monomials 1, f, t*,.... It is clear 
that this space is infinite-dimensional. 
e A basis of C” is given by €1, é2,...,€,, where @; is an n-tuple that has 


a | at the jth position and zeros everywhere else. This basis is called 
the standard basis of C”. Clearly, the space has n dimensions. 

e Abasis of M”*” is given by €11, €12,.--, Cij,---, mn, Where e;; is the 
m Xn matrix with zeros everywhere except at the intersection of the ith 
row and jth column, where it has a one. 

e Aset consisting of the monomials 1, r, t7,...,¢” forms a basis of Peel. 
Thus, this space is (7m + 1)-dimensional. 

e The standard basis of C” is a basis of R” as well. It is also called the 
standard basis of R". Thus, R” is n-dimensional. 
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e If we assume that a < 0 <b, then the set of monomials 1,x,x”,... 
forms a basis for €C° (a, b), because, by Taylor’s theorem, any function 
belonging to C™ (a, b) can be expanded in an infinite power series about 
x = (0). Thus, this space is infinite-dimensional. 


Remark 2.1.1 Given a space V with a basis B = {|a;)}?_,, the span of any 
m vectors (m <n) of B is an m-dimensional subspace of V. 


2.1.2 Factor Space 


Let W be a subspace of the vector space V, and define a relation on V as 
follows. If |a) € V and |b) € V, then we say that |a) is related to |b), and 
write |a) >< |b) if |a) —|b) is in W. It is easy to show that ra is an equivalence 
relation. Denote the equivalence class of |a) by [a], and the factor set (or 
quotient set) {[a]||a) € V} by V/W. We turn the factor set into a factor 
space by defining the combined addition of vectors and their multiplication 
by scalars as follows: 


alfa] + Bl] = [aa + Bb] (2.1) 


where [aa + Bb] is the equivalence class of w|a) + 6|b). For this equation 
to make sense, it must be independent of the choice of the representatives of 
the classes. If [a’] = [a] and [b’] = [d], then is it true that [oa’ + Bb'] = 
[aa + Bb]? For this to happen, we must have 


(a|a’) + B\b’)) — (ala) + Blb)) EW. 


Now, since |a’) € [al], we must have |a’) = |a) + |w1) for some |w1) € W. 
Similarly, |b’) = |b) + |w2). Therefore, 


(a|a') + B\b’)) — (wla) + B\b)) = a|w1) + Blw2) 


and the right-hand side is in W because W is a subspace. 
Sometimes [a] is written as |a) + W. With this notation comes the equal- 
ities 


lwy+W=W, W+W=W, aW=W, aW+pwe=w, 


which abbreviate the obvious fact that the sum of two vectors in W is a 
vector in W, the product of a scalar and a vector in W is a vector in W, and 
the linear combination of two vectors in W is a vector in W. 

How do we find a basis for V/W? Let {|a;)} be a basis for W. Extend it 
to a basis {|a;), |b;)} for V. Then, {[b;]]} form a basis for V/W. Indeed, let 
[a] ¢ V/W. Then, since |a) is in V, we have 


eW 


ae 
[a] = la) + W= )oailai) +) Bjlbj) +W 
i J 
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Thus, {[b;]]} span V/W. To form a basis, they also have to be linearly inde- 
pendent. So, suppose that >? ; 6; |] = [0]. This means that 


\- Bjlbj) +W=|02) +W=W => YY Bj|bj) €W. 
J j 


So the last sum must be a linear combination of {|q;)}: 
>= Bilbi) => ailaj) or > ~ Bjlbj) — )ailaj) =0. 
j i j i 


This is a zero linear combination of the basis vectors of V. Therefore, all co- 
efficients, including all 6; must be zero. One consequence of the argument 
above is (with obvious notation) 


dim(V/W) = dim V — dim W (2.2) 


2.1.3 Direct Sums 


Sometimes it is possible, and convenient, to break up a vector space into spe- 
cial (disjoint) subspaces. For instance, the study of the motion of a particle 
in R? under the influence of a central force field is facilitated by decompos- 
ing the motion into its projections onto the direction of angular momentum 
and onto a plane perpendicular to the angular momentum. This corresponds 
to decomposing a vector in space into a vector, say in the xy-plane and a 
vector along the z-axis. We can generalize this to any vector space, but first 
some notation: Let U and W be subspaces of a vector space V. Denote by 
U-+ W the collection of all vectors in V that can be written as a sum of two 
vectors, one in U and one in W. It is easy to show that U + W is a subspace 
of V. 


Sum of two subspaces 
defined 


Example 2.1.11 Let U be the xy-plane and W the yz-plane. These are both 
subspaces of R?, and so is U+ W. In fact, U + W = R?, because given any 
vector (x, y, Z) in R?, we can write it as 


1 1 
w.y.2= (x 30) +(0, 2). 
ee ees 


eu ew 


This decomposition is not unique: We could also write (x,y,z) = 
(x, sy, 0) + (0, Sy, z), and a host of other relations. 


Definition 2.1.12 Let U and W be subspaces of a vector space V such that direct sum U@ W 
V=U+ Wand UN W = {(0)}. Then we say that V is the direct sum of U defined 
and W and write V=U@ W. 
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Proposition 2.1.13 Let U and W be subspaces of V such that V =U+W. 
Then V = U @ W if and only if any nonzero vector in V can be written 
uniquely as a vector in U plus a vector in W. 


Proof Assume V = U @ W, and let |v) € V be written as a sum of a vector 
in U and a vector in W in two different ways: 


|v) =|u) +|w) =|u’)+|w') <u) —|u’)=|w")—|w). 


The LHS is in U. Since it is equal to the RHS—which is in W—it must be 
in W as well. Therefore, the LHS must equal zero, as must the RHS. Thus, 
|u) = |u’), |w’) =|w), and there is only one way that |v) can be written as a 
sum of a vector in U and a vector in W. 

Conversely, suppose that any vector in V can be written uniquely as a 
vector in U and a vector in W. If |a) € U and also |a) € W, then one can 
write 

2 
jaya aie 


ane anette anette cent coal 
in U in W in U in W 


Hence |a) can be written in two different ways. By the uniqueness assump- 
tion |a) cannot be nonzero. Therefore, the only vector common to both U 
and W is the zero vector. This implies that V =U @ W. 


More generally, we have the following situation: 


Definition 2.1.14 Let {U;}!_, be subspaces of V such that 
V=Ujt+---+U, and U;NU;= {10) } for alli, j=1,...,7r. 
Then we say that V is the direct sum of {U;};_, and write 
; 
V=U6--- OU, =Du. 
i=l 


Let W= UU) @--- ® Us be a direct sum of s subspaces (they need not 
span the entire V). Write W as W=U,; @W, with W’= Ur @--- @ us. 
Let {|u;)};_,; be nonzero vectors with |u;) € U; and suppose that 


Ory |) + 2 |U2) +--+ + as|Us) = |0), (2.3) 
or 


oj|u1)+oa|w')=|0) => o|u1)=—alw’), 


with |w’) € W’. Since a; |u1) € Uy, from the left-hand side, and a ,|u,) € W’ 
from the right-hand side, we must have a@1|u1) = |O). Hence, a; = 0 because 
|u1) 4 |0). Equation (2.3) now becomes 


2|U2) + 3|U3) +--+ as|Us) = |0). 


2.1 Vector Spaces 
Write this as 
ag|u2) + B|w")=|0) => aalu2)=—Blw"), 


where W’ = U2 ® W" with W” = U3 ®--- @® Us and |w”) € W”. An argu- 
ment similar to above shows that w2 = 0. Continuing in this way, we have 


Proposition 2.1.15 The vectors in different subspaces of Definition 2.1.14 
are linearly independent. 


Proposition 2.1.16 Let U be a subspace of V. Then there exist a subspace 
W of V such that V=U@ W. 


Proof Let {|u;)}"_, be a basis of U. Extend this basis to a basis {lui}, 


of V. Then W = Span{|uj)} oat 


Example 2.1.17 Let U be the xy-plane and W the z-axis. These are both 
subspaces of R?, and so is U-+ W. Furthermore, it is clear that U-++ W = R?, 
because given any vector (x, y, z) in R*, we can write it as 


(x,y,z) = (x, y,0)+ (0, 0, z). 
ee eee 
eu ew 


This decomposition is obviously unique. Therefore, R? = U@ W. 
Proposition 2.1.18 /fV =U @ W, then dim V = dimU + dim W. 
Proof Let {\u;)}"_, be a basis for U and {lwi)}K_, a basis for W. Then it 


is easily verified that {|u1),|u2),...,|Um), |w1), |w2),..., |we)} is a basis 
for V. The details are left as an exercise. 


Let U and V be any two vector spaces over R or C. Consider the Cartesian 
product W = U x VY of their underlying set. Define a scalar multiplication 
and a sum on W by 


e(|u), |v)) = (alu), o|v)) 


(2.4) 
(Jui), |v1)) + (luz), [v2)) = (Jur) + lua), [v1) + |v2)). 


With |0) w = ((0)y,|0)y), W becomes a vector space. Furthermore, if we 
identify U and V with vectors of the form (|u), |0) vy) and (|0)y, |v)), respec- 
tively, then U and V become subspaces of W. If a vector |w) € W belongs 
to both U and V, then it can be written as both (|u), |O)y) and (|0)y, |v)), 
ie., ((u), |O)v) = ([0)u, |v)). But this can happen only if |w) = |0)y and 
|v) = |0)y, or |w) = |0) w. Thus, the only common vector in U and V is the 
zero vector. Therefore, 


Proposition 2.1.19 Let U and V be any two vector spaces over R or C. 
Then their Cartesian product W = U x V together with the operations de- 


dimensions in a direct 
sum 
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fined in Eq. (2.4) becomes a vector space. Furthermore, W =U @® V if U 
and V are identified with vectors of the form (\u),|0)y) and (|0)y, |v)), 
respectively. 


Let {la;)}“, be a basis of U and {lbj)¥eey a basis of V. Define the vectors 
{\cn) } M4" in W = U6 V by 
lex) = (lax), 10) v) ifl<k<M 


(2.5) 
Ick) =(lO)u, |be-m)) if M4+1<k<M4N. 


Then {|cx)} = vil are linearly independent. In fact, 


M+N 
> valcke) =|0)w iff 
k=1 


M N 
d- vi(Iax), 10)v) + Yar j (10)u- 1b7)) = (10)u, 10)v). 
k=1 j=1 
or 
M N 
(> Vela); ov) + (inv. Sei) = (|0)v, |0)v), 
k=1 j=l 
or 
M N 
» Vela)» a) = (|0)u, |0)v), 
k=1 j=l 
or 


M N 
Yo velar) =10)y and} yy j|bj) =10)v. 
k=] j=! 


M 


Linear independence of {|a;)};— 


k<M+N. 
It is not hard to show that W = Span{|cx) ae . Hence, we have the 
following 


and {|b;)}_, imply that y, = 0 for 1 < 


Theorem 2.1.20 Let {|a;)}!, be a basis of U and {|b nie a basis of V. 


The set of vectors ten” defined by Eq. (2.5) form a basis of the direct 
sum W=U® V. In particular, W has dimension M+ N. 


2.1.4 Tensor Product of Vector Spaces 


Direct sum is one way of constructing a new vector space out of two. There 
is another procedure. Let U and V be vector spaces. On their Cartesian prod- 
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uct, impose the scalar product and bilinearity conditions: 
ar(|u), 1v)) = (alu), |v)) = (lu), alv)) 
bieeey v)) =a (ur. v)) +aa(u2).|v)) (2.6) 
(\w), Bilv1) + B2lv2)) = Bi(|w), |v1)) + B2(lu), |v2)). 


These properties turn U x V into a vector space called the tensor product of 
U and V and denoted by U@ V.? The vectors in the tensor product space are 
denoted by |u) ® |v), (or occasionally by |uv)). If (lai, and (lbs) 
are bases in U and V, respectively, and 


M N 
=) ajlaj) and |v) => Bjlbj), 
i=l j=l 


then Eq. (2.6) yields 


N M N 
u) @|v) = (Soot ® (3:06 = Yo ai Bjlai) @ |bj). 
j=l 


i=l j=) 


Therefore, {|a;) ® |b;)} is a basis of U® V and dim(U ® V) = dim Udim V. 
From (2.6), we have 


|0) vy ® |v) = (lu) — |u)) ® |v) = lu) @ |v) — |v) @ |v) = [O)vav 


Similarly, |v) ® |0)y = |O)vev. 


2.2 Inner Product 


A vector space, as given by Definition 2.1.1, is too general and structureless 
to be of much physical interest. One useful structure introduced on a vector 
space is a scalar product. Recall that the scalar (dot) product of vectors in 
the plane or in space is a rule that associates with two vectors a and b, a real 
number. This association, denoted symbolically by g: V x V > R, with 
g(a, b) =a-b, is symmetric: g(a, b) = g(b, a), is linear in the first (and by 
symmetry, in the second) factor:+ 


g(aa+ Bb, c) =ag(a,c)+ Bg(b,c) or (aa+ fhb)-c=aa-c+ pb-c, 


gives the “length” of a vector: |al* = g(a,a) = a- a> 0, and ensures that 
the only vector with zero length? is the zero vector: g(a, a) = 0 if and only 
ifa=0. 


3A detailed discussion of tensor products and tensors in general is given in Chap. 26. 
4A function that is linear in both of its arguments is called a bilinear function. 


5In our present discussion, we are avoiding situations in which a nonzero vector can have 
zero “length”. Such occasions arise in relativity, and we shall discuss them in Part VIII. 
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We want to generalize these properties to abstract vector spaces for which 
the scalars are complex numbers. A verbatim generalization of the forego- 
ing properties, however, leads to a contradiction. Using the linearity in both 
arguments and a nonzero |a), we obtain 


g(ila), ila)) =i?g(\a), |a)) = —g(la), |a)). (2.7) 


Either the right-hand side (RHS) or left-hand side (LHS) of this equation 
must be negative! But this is inconsistent with the positivity of the “length” 
of a vector, which requires g(|a), |a)) to be positive for all nonzero vectors, 
including i|a). The source of the problem is the linearity in both arguments. 
If we can change this property in such a way that one of the i’s in Eq. (2.7) 
comes out complex-conjugated, the problem may go away. This requires lin- 
earity in one argument and complex-conjugate linearity in the other. Which 
argument is to be complex-conjugate linear is a matter of convention. We 
choose the first argument to be so.° We thus have 


g(ala) + B|b), |c)) = a*g(la), |c)) + B*8(Ib), Ic), 


where a* denotes the complex conjugate. Consistency then requires us 
to change the symmetry property as well. In fact, we must demand that 
g(|a), |b)) = (g(\b), |a)))*, from which the reality of g(\a), |a))—a neces- 
sary condition for its positivity—follows immediately. 

The question of the existence of an inner product on a vector space is a 
deep problem in higher analysis. Generally, if an inner product exists, there 
may be many ways to introduce one on a vector space. However, as we 
shall see in Sect. 2.2.4, a finite-dimensional vector space always has an inner 
product and this inner product is unique.’ So, for all practical purposes we 
can speak of the inner product on a finite-dimensional vector space, and as 
with the two- and three-dimensional cases, we can omit the letter g and use 
a notation that involves only the vectors. There are several such notations in 
use, but the one that will be employed in this book is the Dirac bra(c)ket 
notation, whereby g(|a), |b)) is denoted by (a|b). Using this notation, we 
have 


Definition 2.2.1 The inner product of two vectors, |a) and |b), in a vector 
space V is a complex number, (a|b) € C, such that 

1. (a|b) = (bla)* 

2. (a|(B|b) + yle)) = Bald) + y (alc) 

3. (ala) > 0, and (a|a) = 0 if and only if |a) = |0). 


The last relation is called the positive definite property of the inner prod- 


In some books, particularly in the mathematical literature, the second argument is chosen 
to be conjugate linear. 


7This uniqueness holds up to a certain equivalence of inner products that we shall not get 
into here. 
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uct.® A positive definite real inner product is also called a Euclidean inner 
product, otherwise it is called pseudo-Euclidean. 


Note that linearity in the first argument is absent in the definition above, 
because, as explained earlier, it would be inconsistent with the first property, 
which expresses the “symmetry” of the inner product. The extra operation of 
complex conjugation renders the true linearity in the first argument impos- 
sible. Because of this complex conjugation, the inner product on a complex 
vector space is not truly bilinear; it is commonly called sesquilinear or her- 
mitian. 

A shorthand notation will be useful when dealing with the inner product 
of a linear combination of vectors. 


Box 2.2.2 We write the LHS of the second equation in the definition 
above as (a\|Bb+ yc). 


This has the advantage of treating a linear combination as a single vector. 
The second property then states that if the complex scalars happen to be in 
a ket, they “split out” unaffected: 


(a|Bb + yc) = Bialb) + y (alc). (2.8) 


On the other hand, if the complex scalars happen to be in the first factor (the 
bra), then they should be conjugated when they are “split out”: 


(Bb + ycla) = B* (bla) + y*(cla). (2.9) 


A vector space V on which an inner product is defined is called an inner 
product space. As mentioned above, a finite-dimensional vector space can 
always be turned into an inner product space. 


Example 2.2.3 In this example, we introduce some of the most common 
inner products. The reader is urged to verify that in all cases, we indeed 
have an inner product. 


e = Let|a), |b) ¢ C", with |a) = (a1, #2, ..., @) and |b) = (B1, B2,..-, Bn), 
and define an inner product on C” as 


n 
(a|b) = a} Bi +05 Pp +---+0%Bn = > a¥ fj. 


i=1 


That this product satisfies all the required properties of an inner product 
is easily checked. For example, if |b) = |a), we obtain (ala) = |a1|7 + 
|az|* +++» + |an|?, which is clearly nonnegative. 


8The positive definiteness must be relaxed in the space-time of relativity theory, in which 
nonzero vectors can have zero “length”. 
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e Similarly, for |a), |b) € R” the same definition (without the complex 
conjugation) satisfies all the properties of an inner product. 

e For |a), |b) € C™ the natural inner product is defined as (a|b) = 
pean a* B;. The question of the convergence of this infinite sum is the 
subject of Problem 2.18. 

e Let x(t), y(t) € P*[t], the space of all polynomials in t with complex 
coefficients. Define 


b 
wis f w(t)x*(t)y(t) dt, (2.10) 


where a and b are real numbers—or infinity—for which the integral 
exists, and w(t), called the weight function, is a real-valued, continu- 
ous function that is always strictly positive in the interval (a, b). Then 
Eq. (2.10) defines an inner product. Depending on the weight function 
w(t), there can be many different inner products defined on the infinite- 
dimensional space P°[t]. 

e Let f, g € C(a, b) and define their inner product by 


b 
(fle) = / weeds 


It is easily shown that (f|g) satisfies all the requirements of the inner 
product if, as in the previous case, the weight function w(x) is always 
positive in the interval (a, b). This is called the standard inner product 
on Ca, b). 


2.2.1 Orthogonality 


The vectors of analytic geometry and calculus are often expressed in terms 
of unit vectors along the axes, i.e., vectors that are of unit length and per- 
pendicular to one another. Such vectors are also important in abstract inner 
product spaces. 


Definition 2.2.4 Vectors |a),|b) € V are orthogonal if (a|b) = 0. A nor- 
mal vector, or normalized vector, |e) is one for which (ele) = 1. A basis 
B= {le:)}e , in an N-dimensional vector space V is an orthonormal basis 
if 
leley=ayay VO? Q.11) 
ej\e;) =djj,= ; 
mT |0 iti Fj, 
where 6;;, defined by the last equality, is called the Kronecker delta. 


Example 2.2.5 Let U and V be inner product vector spaces. Let W = 
U ® V. Then an inner product can be defined on W in terms of those on 
U and V. In fact, it can be easily shown that if |w;) = (\u;), |v;)), 7 = 1, 2, 
then 


(wi|w2) = (ui|u2) + (v1 |v2) (2.12) 
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[@:) 


|a,) 
le) 
(d) 


(a) 


Fig. 2.1 The essence of the Gram—Schmidt process is neatly illustrated by the process 
in two dimensions. This figure, depicts the stages of the construction of two orthonormal 
vectors 


defines an inner product on W. Moreover, with the identification 
U={ (Iu), 10)v) lu) Uf and V={(lO)v,|v)) ||) € Vf, 
any vector in U is orthogonal to any vector in V. 


Example 2.2.6 Here are examples of orthonormal bases: 


e = The standard basis of R” (or C”) 
le1) =(1,0,...,0), ler) = (0, 1,..., 0), SPs g lén) = (0,0,..., 1) 


is orthonormal under the usual inner product of those spaces. 
e Let jez) = e!** /./2z be functions in C(0, 277) with w(x) = 1. Then 


1 20 : ; 
(ex|ex) = =| e thx gikx gy = 1, 
m Jo 


and for! #k, 


1 20 ; , 1 20 , 
(e|ex) = =| et @tkx yy = =| eb k-Dx gy = 0, 
T JO JO 


Thus, (e/|ex) = dix. 


2.2.2. The Gram-Schmidt Process 


It is always possible to convert—by taking appropriate linear combinations— 

any basis in V into an orthonormal basis. A process by which this may 

be accomplished is called Gram-Schmidt orthonormalization. Consider 

a basis B = flap oe We intend to take linear combinations of |a;) 

in such a way that the resulting vectors are orthonormal. First, we let 

le) = |a1)/./(ailai) and note that (e|e,) = 1. If we subtract from Jay) "he Gram-Schmidt 
its projection along |e;), we obtain a vector that is orthogonal to |e;) (see Process explained 
Fig. 2.1). 
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(a) (b) (c) 


Fig. 2.2 Once the orthonormal vectors in the plane of two vectors are obtained, the third 
orthonormal vector is easily constructed 


Calling the resulting vector |e), we have |e,) = Jaz) — (e1|a2)|e1), which 
can be written more symmetrically as |e,) = |a2) — |e) (e1|a2). Clearly, 
this vector is orthogonal to |e;). In order to normalize |e’), we divide it 


by ,/(e5|e5). Then |e2) = |e5)/,/(e5|e5) will be a normal vector orthogonal 


to |e;). Subtracting from |a3) its projections along the first and second unit 
vectors obtained so far will give the vector 


2 


|e) = |a3) — |e1) (e1]a3) — lez) (e21a3) = |a3) — > |e;)(e;|a3), 


i=1 
which is orthogonal to both |e;) and |e2) (see Fig. 2.2): 


=1 =0 


r —_ —__ 
(e1|e3) = (e11a3) — (e1|e1) (e1]a3) — (e1|e2) (e2|a3) = 0. 
Similarly, (e2|e4) = 0. 


Historical Notes 

Erhard Schmidt (1876-1959) obtained his doctorate under the supervision of David 
Hilbert. His main interest was in integral equations and Hilbert spaces. He is the 
“Schmidt” of the Gram-Schmidt orthogonalization process, which takes a basis of 
a space and constructs an orthonormal one from it. (Laplace had presented a special case 
of this process long before Gram or Schmidt.) 

In 1908 Schmidt worked on infinitely many equations in infinitely many unknowns, in- 
troducing various geometric notations and terms that are still in use for describing spaces 
of functions. Schmidt’s ideas were to lead to the geometry of Hilbert spaces. This was 
motivated by the study of integral equations (see Chap. 18) and an attempt at their ab- 
straction. 

Earlier, Hilbert regarded a function as given by its Fourier coefficients. These satisfy 
the condition that ye az is finite. He introduced sequences of real numbers {x,} 
such that )°°° ; x2 is finite. Riesz and Fischer showed that there is a one-to-one cor- 
respondence between square-integrable functions and square-summable sequences of 
their Fourier coefficients. In 1907 Schmidt and Fréchet showed that a consistent theory 
could be obtained if the square-summable sequences were regarded as the coordinates 
of points in an infinite-dimensional space that is a generalization of n-dimensional Eu- 
clidean space. Thus functions can be regarded as points of a space, now called a Hilbert 
space. 
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In general, if we have calculated m orthonormal vectors |e1),..., |@m), 
with m < N, then we can find the next one using the following relations: 


m 


|emn+1) = |€m+1) — > lei) (@: lami), 
i=1 
(2.13) 
lent1) 


y , , 
com lem-1) 


Even though we have been discussing finite-dimensional vector spaces, the 
process of Eq. (2.13) can continue for infinite-dimensions as well. The 
reader is asked to pay attention to the fact that, at each stage of the Gram— 
Schmidt process, one is taking linear combinations of the original vectors. 


l€m+1) = 


2.2.3. The Schwarz Inequality 


Let us now consider an important inequality that is valid in both finite and 
infinite dimensions and whose restriction to two and three dimensions is 
equivalent to the fact that the cosine of the angle between two vectors is 
always less than one. 


Theorem 2.2.7 For any pair of vectors |a), |b) in an inner product space V, 
the Schwarz inequality holds: (a\a)(b\b) > |(a\b)|?. Equality holds when 
|a) is proportional to |b). 


Proof Let |c) = |b) — ((a|b) /(a|a))|a), and note that (a|c) = 0. Write |b) = 
((a|b) /(a|a))|a) + |c) and take the inner product of |b) with itself: 


2; 2 
(b|b) = aa (ala) + (cle) = Halon” + (ele). 
(ala) (ala) 
Since (c|c) > 0, we have 
b 2 
(oj) > WaT 8, (atay(o|b) = |(aloy|”. 
(ala) 


Equality holds iff (c|c) = 0, i.e., iff |c) = 0. From the definition of |c), we 
conclude that for the equality to hold, |a) and |b) must be proportional. 


Notice the power of abstraction: We have derived the Schwarz inequality 
solely from the basic assumptions of inner product spaces independent of 
the specific nature of the inner product. Therefore, we do not have to prove 
the Schwarz inequality every time we encounter a new inner product space. 


Historical Notes 

Karl Herman Amandus Schwarz (1843-1921) the son of an architect, was born in 
what is now Sobiecin, Poland. After gymnasium, Schwarz studied chemistry in Berlin for 
a time before switching to mathematics, receiving his doctorate in 1864. He was greatly 
influenced by the reigning mathematicians in Germany at the time, especially Kummer 


Schwarz inequality 
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and Weierstrass. The lecture notes that Schwarz took while attending Weierstrass’s lec- 
tures on the integral calculus still exist. Schwarz received an initial appointment at Halle 
and later appointments in Zurich and Gottingen before being named as Weierstrass’s suc- 
cessor at Berlin in 1892. These later years, filled with students and lectures, were not 
Schwarz’s most productive, but his early papers assure his place in mathematics history. 
Schwarz’s favorite tool was geometry, which he soon turned to the study of analysis. He 
conclusively proved some of Riemann’s results that had been previously (and justifiably) 
challenged. The primary result in question was the assertion that every simply connected 
region in the plane could be conformally mapped onto a circular area. From this effort 
came several well-known results now associated with Schwarz’s name, including the prin- 
ciple of reflection and Schwarz’s lemma. He also worked on surfaces of minimal area, the 
branch of geometry beloved by all who dabble with soap bubbles. 

Schwarz’s most important work, for the occasion of Weierstrass’s seventieth birthday, 
again dealt with minimal area, specifically whether a minimal surface yields a minimal 
area. Along the way, Schwarz demonstrated second variation in a multiple integral, con- 
structed a function using successive approximation, and demonstrated the existence of a 
“least” eigenvalue for certain differential equations. This work also contained the most 
famous inequality in mathematics, which bears his name. 

Schwarz’s success obviously stemmed from a matching of his aptitude and training to the 
mathematical problems of the day. One of his traits, however, could be viewed as either 
positive or negative—his habit of treating all problems, whether trivial or monumental, 
with the same level of attention to detail. This might also at least partly explain the decline 
in productivity in Schwarz’s later years. 

Schwarz had interests outside mathematics, although his marriage was a mathematical 
one, since he married Kummer’s daughter. Outside mathematics he was the captain of the 
local voluntary fire brigade, and he assisted the stationmaster at the local railway station 
by closing the doors of the trains! 


2.2.4 Length of a Vector 


In dealing with objects such as directed line segments in the plane or in 
space, the intuitive idea of the length of a vector is used to define the dot 
product. However, sometimes it is more convenient to introduce the inner 
product first and then define the length, as we shall do now. 


Definition 2.2.8 The norm, or /ength, of a vector |a) in an inner product 
space is denoted by ||a|| and defined as ||a|| = (ala). We use the notation 
||wa + Bb|| for the norm of the vector a|a) + B|b). 


One can easily show that the norm has the following properties: 


The norm of the zero vector is zero: ||O|| = 0. 

\|a|| = 0, and ||a|| = 0 if and only if |a) = |0). 

\|wa|| = |e|||a|| for any? complex a. 

|a + b|| < |la|| + ||D||. This property is called the triangle inequality. 


Nae aoe a ee 


Any function on a vector space satisfying the four properties above is 
called a norm, and the vector space on which a norm is defined is called a 
normed linear space. One does not need an inner product to have a norm. 

One can introduce the idea of the “distance” between two vectors in 
a normed linear space. The distance between |a) and |b)—denoted by 
d(a, b)—is simply the norm of their difference: d(a, b) = |la — b|j. It can 


°The first property follows from this by letting a = 0. 
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be readily shown that this has all the properties one expects of the distance 
(or metric) function introduced in Chap. |. However, one does not need a 
normed space to define distance. For example, as explained in Chap. 1, one 
can define the distance between two points on the surface of a sphere, but the 
addition of two points on a sphere—a necessary operation for vector space 
structure—is not defined. Thus the points on a sphere form a metric space, 
but not a vector space. 

Inner product spaces are automatically normed spaces, but the converse 
is not, in general, true: There are normed spaces, i.e., spaces satisfying prop- 
erties 1-4 above that cannot be promoted to inner product spaces. However, 
if the norm satisfies the parallelogram law, 


lla + bll* + lla — Bll? = 2Ilal|? + 2\)b11, (2.14) 
then one can define 


(a|b) = —{\la + BI? — lla — b||? — (lla + id]? — ja —ib|]?)} (2.15) 


1 
4 
and show that it is indeed an inner product. In fact, we have (see [Frie 82, 


pp. 203-204] for a proof) the following theorem. 


Theorem 2.2.9 A normed linear space is an inner product space if and only 
if the norm satisfies the parallelogram law. 


Now consider any N-dimensional vector space V. Choose a basis 


{lai}, in V, and for any vector |a) whose components are {oi}, in 
this basis, define 


N 
2— 2 
llall? = Sloe. 
i=l 


The reader may check that this defines a norm, and that the norm satisfies 
the parallelogram law. From Theorem 2.2.9 we have the following: 


Theorem 2.2.10 Every finite-dimensional vector space can be turned into 
an inner product space. 


Example 2.2.11 Let the space be C”. The natural inner product of C” gives 
rise to a norm, which, for the vector |a) = (a1, @2,..., @) 18 


lal = (ala) = | > lal’. 
i=l 


This norm yields the following distance between |a) and |b) = (61, fo, 


.. +, Bn): 


d(a, b) = la — bl| = (a — bla—b) = |) 0 lai — Bil?. 
i=] 


parallelogram law 


C” has many different 
distance functions 
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One can define other norms, such as ||a||; = >~/_, |a;|, which has all the 
required properties of a norm, and leads to the distance 


n 
d\(a, b) = |la — bili = Dai — Bil. 
i=1 


Another norm defined on C” is given by 


n 1/p 
llallp = (Sse) 
i=l 


where p is a positive integer. It is proved in higher mathematical analysis 
that || - ||» has all the properties of a norm. (The nontrivial part of the proof 
is to verify the triangle inequality.) The associated distance is 


n 1/p 
dp(a, b) = |la—bllp = 3 la; - ni) ; 


i=1 


The other two norms introduced above are special cases, for p = 2 and 
p=l. 


2.3 Linear Maps 


We have made progress in enriching vector spaces with structures such as 
norms and inner products. However, this enrichment, although important, 
will be of little value if it is imprisoned in a single vector space. We would 
like to give vector space properties freedom of movement, so they can go 
from one space to another. The vehicle that carries these properties is a lin- 
ear map or linear transformation which is the subject of this section. First 
it is instructive to review the concept of a map (discussed in Chap. 1) by 
considering some examples relevant to the present discussion. 


Example 2.3.1 The following are a few familiar examples of mappings. 


1. Let f:R— R be given by f(x) =x’. 
. Let g:R* > R be given by g(x, y) =x? + y? —4. 

3. Let F : R* > C be given by F(x, y) = U(x, y) + iV(x, y), where 
U:R?>RandV:R?>R. 

4. Let T:R— R* be given by T(t) = (t +3, 2t — 5). 

5. Motion of a point particle in space can be considered as a mapping 
M : [a,b] > R?, where [a, d] is an interval of the real line. For each 
t € [a,b], we define M(t) = (x(t), y(t), z(@)), where x(t), y(t), and 
z(t) are real-valued functions of t. If we identify ¢ with time, which is 
assumed to have a value in the interval [a, b], then M(t) describes the 
path of the particle as a function of time, and a and b are the beginning 
and the end of the motion, respectively. 
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Let us consider an arbitrary mapping F : V > W from a vector space 
V to another vector space W. It is assumed that the two vector spaces are 
over the same scalars, say C. Consider |a) and |b) in V and |x) and | y) in W 
such that F(|a)) = |x) and F(|b)) = |y). In general, F does not preserve the 
vector space structure. That is, the image of a linear combination of vectors 
is not the same as the linear combination of the images: 


F (ala) + Bb) #aF (|x)) + BF (ly). 


This is the case for all the mappings of Example 2.3.1. There are many appli- 
cations in which the preservation of the vector space structure (preservation 
of the linear combination) is desired. 

linear map (or 
Definition 2.3.2 A linear map (or transformation) from the complex vec-_ transformation), linear 
tor space V to the complex vector space W is a mapping T: V — W such_ operator, 
that endomorphism 


T(a|a) + B\b)) =aT(la)) + AT(\b)) Via), |b) € Vanda, BEC. 


A linear transformation T : V — V is called an endomorphism of V or a 
linear operator on V. The action of a linear transformation on a vector is 
written without the parentheses: T(|a)) =T|a). 


The same definition applies to real vector spaces. Note that the defini- 
tion demands that both vector spaces have the same set of scalars: The 
same scalars must multiply vectors in V on the LHS and those in W on 
the RHS. 

The set of linear maps from V to W is denoted by £(V, W), and this 
set happens to be a vector space. The zero transformation, 0, is defined 
to take every vector in V to the zero vector of W. The sum of two linear 
transformations T and U is the linear transformation T + U, whose action 
on a vector |a) € V is defined to be (T + U)|a) = T\|a) + Uja). Similarly, 
define aT by (a@T)|a) = a(T\a)) = aT\a). The set of endomorphisms of V 
is denoted by £(V) or End(V) rather than £(V, V). We summarize these 
observations in 


£(V, W) is a vector 
space 


Box 2.3.3 L(V, W) is a vector space. In particular, so is the set of 
endomorphisms of a single vector space £(V) = End(V) = £(V, V). 


Definition 2.3.4 Let V and U be inner product spaces. A linear map T: 
V — Uis called an isometric map if!” 


(Ta|Tb) = (alb), V\a),|b) eV. 


!0Tt is convenient here to use the notation |Ta) for T|a). This would then allow us to write 
the dual (see below) of the vector as (Ta|, emphasizing that it is indeed the bra associated 
with Ta). 
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isometry If U=YV, then T is called a linear isometry or simply an isometry of V. It is 
common to call an isometry of a complex (real) V a unitary (orthogonal) 
operator 


derivative operator 


integration operator 


Example 2.3.5 The following are some examples of linear operators in var- 
ious vector spaces. The proofs of linearity are simple in all cases and are left 
as exercises for the reader. 


LS 


Let V be a one-dimensional space (e.g., V = C). Then any linear endo- 
morphism T of V is of the form T|x) = a|x) with @ a scalar. In partic- 
ular, if T is an isometry, then ||? = 1. If V=R and T is an isometry, 
then T|x) = +|x). 

Let z be a permutation (shuffling) of the integers {1,2,...,}. If |x) = 
(11,2, -++, Mn) iS a vector in C”, we can write 


Ax |x) = (r(1)s Nr (2)s «++» Na(n))- 


Then A, is a linear operator. 
For any |x) € P°[t], with x(t) = )-4_ a,t*, write |y) = D|x), where 


|y) is defined as y(t) = )°y_, ka,t*—!. Then D is a linear operator, the 
derivative operator. 
For every |x) € P[t], with x(t) = )-y_9 axt*, write |y) = S|x), where 


ly) € P°[r] is defined as y(t) = \t_olax/(k + L]ek*!. Then S$ is a 
linear operator, the integration operator. 

Let C”(a, b) be the set of real-valued functions defined in the inter- 
val [a, b] whose first n derivatives exist and are continuous. For any 
| f) € C" (a,b) define |u) = G|f), with u(t) = g(t) f(t) and g(t) a 
fixed function in €”(a,b). Then G is linear. In particular, the oper- 
ation of multiplying by t, whose operator is denoted by T, is lin- 
eat. 


An immediate consequence of Definition 2.3.2 is the following: 


Box 2.3.6 Two linear transformations T:V — W and U: V > W 
are equal if and only if T\a;) = U|a;) for all |a;) in some basis of V. 


Thus, a linear transformation is uniquely determined by its action on 


some basis of its domain space. 


The equality in this box is simply the set-theoretic equality of maps dis- 


cussed in Chap. 1. 


The equality of operators can also be established by other, more conve- 


nient, methods when an inner product is defined on the vector space. The 
following two theorems contain the essence of these alternatives. 


Theorem 2.3.7 An endomorphism T of an inner product space is 0 if and 
only if (b|T|a) = (b|Ta) = 0 for all |a) and |b). 


2.3 Linear Maps 


Proof Clearly, if T= 0 then (b|T|a) = 0. Conversely, if (b|T|a) = 0 for all 
|a) and |b), then, choosing |b) = T|a) = |Ta), we obtain 


(Ta\Ta)=0 Vila) < Tla)=0 Vila) © T=0 


by positive definiteness of the inner product. 


Theorem 2.3.8 A linear operator T on an inner product space is 0 if and 
only if (a|T|a) =0 for all \a). 


Proof Obviously, if T = 0, then (a|T|a) = 0. Conversely, choose a vector 
a|a) + B|b), sandwich T between this vector and its bra, and rearrange terms 
to obtain what is known as the polarization identity 


a* B(a|T|b) + ap*(b|T\a) = (aa + Bb|T|aa + Bb) 
— |a|?(a|T\a) — |B|? (bIT|d). 


According to the assumption of the theorem, the RHS is zero. Thus, if we 
let a = B = 1 we obtain (a|T|b) + (b|T|a) = 0. Similarly, with a = 1 and 
B =i we get i(a|T|b) —i(b|T\|a) = 0. These two equations give (a|T|b) = 0 
for all |a), |b). By Theorem 2.3.7, T= 0. 


To show that two operators U and T on an inner product space are equal, 
one can either have them act on an arbitrary vector and show that they give 
the same result, or one verifies that U — T is the zero operator by means of 
one of the theorems above. Equivalently, one shows that (a|T|b) = (a|U|b) 
or (a|T|a) = (a|U|a) for all |a), |b). 


2.3.1. Kernel of a Linear Map 


It follows immediately from Definition 2.3.2 that the image of the zero vec- 
tor in V is the zero vector in W. This is not true for a general mapping, but 
it is necessarily true for a linear mapping. As the zero vector of V is mapped 
onto the zero vector of W, other vectors of V may also be dragged along. In 
fact, we have the following theorem. 


Theorem 2.3.9 The set of vectors in V that are mapped onto the zero vector 
of W under the linear transformation T: V — W form a subspace of V 
called the kernel, or null space, of T and denoted by kerT. 


Proof The proof is left as an exercise. 


The dimension of ker T is also called the nullity of V. 
The proof of the following is also left as an exercise. 


Theorem 2.3.10 The range T(V) of a linear map T: V > W is a subspace 
of W. The dimension of T(V) is called the rank of T. 


polarization identity 


kernel of a linear 
transformation 


nullity 


rank of a linear 
transformation 
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Theorem 2.3.11 A linear transformation is 1-1 (injective) iff its kernel is 
zero. 


Proof The “only if” part is trivial. For the “if” part, suppose T|a,) = T|a2); 
then linearity of T implies that T(|a;) — |a2)) = 0. Since kerT = 0,!! we 
must have |a;) = |a2). 


Theorem 2.3.12 A linear isometric map is injective. 


Proof LetT: V— U be a linear isometry. Let |a) € kerT, then 


(a\a) = (Ta|Ta) = (0|0) = 0. 


Therefore, |a) = |0). By Theorem 2.3.11, T is injective. 


Suppose we start with a basis of kerT and add enough linearly inde- 
pendent vectors to it to get a basis for V. Without loss of generality, let us 
assume that the first n vectors in this basis form a basis of kerT. So let 
B= {{a,), |az),..., |an)} be a basis for V and B’ = {|az), |az), ..., |an)} be 
a basis for kerT. Here N = dimV and n = dimkerT. It is straightforward 
to show that {T|an+1),..., T|ay)} is a basis for T(V). We therefore have the 
following result (see also the end of this subsection). 


Theorem 2.3.13 Let T: V > W be a linear transformation. Then'* 


dim V = dimkerT + dim T(V) 


This theorem is called the dimension theorem. One of its consequences 
is that an injective endomorphism is automatically surjective, and vice versa: 


Proposition 2.3.14 An endomorphism of a finite-dimensional vector space 
is bijective if it is either injective or surjective. 


The dimension theorem is obviously valid only for finite-dimensional 
vector spaces. In particular, neither surjectivity nor injectivity implies bijec- 
tivity for infinite-dimensional vector spaces. 


Example 2.3.15 Let us try to find the kernel of T : R+ > R? given by 


T(x], 2,.%3,%4) 


= (2x, + x2 +43 — x4, X1 + x2 + 2x3 4+ 24, x1 — x3 — 3x4). 


'I Since ker T is a set, we should write the equality as kerT = {|0)}, or at least as ker T = 
|0). However, when there is no danger of confusion, we set {|0)} = |0) = 0. 


'Recall that the dimension of a vector space depends on the scalars used in that space. 
Although we are dealing with two different vector spaces here, since they are both over 
the same set of scalars (complex or real), no confusion in the concept of dimension arises. 


2.3 Linear Maps 
We must look for (x1, x2, x3, x4) such that T(x1, x2, x3, x4) = (0, 0, 0), or 


2x1 + x2 + x3 —x4=0, 
Xp + x2 + 2x3 4+2x4=0, 


xy — x3 —3x4=0. 


The “solution” to these equations is xj = x3 + 3x4 and x2 = —3x3 — 5x4. 
Thus, to be in kerT, a vector in R* must be of the form 


(x3 = 3X4, —3x3 _ 5x4, X3, x4) = x3(1, =3, 1, 0) + x4(3, —5, 0, 1), 


where x3 and x4 are arbitrary real numbers. It follows that kerT consists 
of vectors that can be written as linear combinations of the two linearly 
independent vectors (1, —3, 1,0) and (3, —5, 0, 1). Therefore, dim kerT = 
2. Theorem 2.3.13 then says that dimT(V) = 2; that is, the range of T is 
two-dimensional. This becomes clear when one notes that 


T(x1, x2, %3, X4) 


= (2x, +x2 +.x3 — x4), 0, 1) + (X +.x2 + 2x3 + 2x4)(0, 1, -1), 


and therefore T(x1, x2,.*3, x4), an arbitrary vector in the range of T, is a 
linear combination of only two linearly independent vectors, (1,0, 1) and 
(0, 1, —1). 


2.3.2 Linear lsomorphism 


In many cases, two vector spaces may “look” different, while in reality they 
are very much the same. For example, the set of complex numbers C is a 
two-dimensional vector space over the reals, as is R*. Although we call the 
vectors of these two spaces by different names, they have very similar prop- 
erties. This notion of “similarity” is made precise in the following definition. 


Definition 2.3.16 A vector space V is said to be isomorphic to another 
vector space W, and written V = W, if there exists a bijective linear map 
T: V— W. Then T is called an isomorphism.!° A bijective linear map of 
V onto itself is called an automorphism of V. An automorphism is also 
called an invertible linear map. The set of automorphisms of V is denoted 
by GL(V). 


An immediate consequence of the injectivity of an isometry and Propo- 
sition 2.3.14 is the following: 


'3The word “isomorphism”, as we shall see, is used in conjunction with many algebraic 
structures. To distinguish them, qualifiers need to be used. In the present context, we speak 
of linear isomorphism. We shall use qualifiers when necessary. However, the context 
usually makes the meaning of isomorphism clear. 


isomorphism and 
automorphism 
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Proposition 2.3.17 An isometry of a finite-dimensional vector space is an 
automorphism of that vector space. 


For all practical purposes, two isomorphic vector spaces are different 
manifestations of the “same” vector space. In the example discussed above, 
the correspondence T : C > R2, with T(x +i y) = (x, y), establishes an iso- 
morphism between the two vector spaces. It should be emphasized that only 
as vector spaces are C and R? isomorphic. If we go beyond the vector space 
structures, the two sets are quite different. For example, C has a natural mul- 
tiplication for its elements, but R* does not. The following three theorems 
give a working criterion for isomorphism. The proofs are simple and left to 
the reader. 


Theorem 2.3.18 A linear surjective map T: V — W is an isomorphism if 
and only if its nullity is zero. 


Theorem 2.3.19 An injective linear transformation T : V — W carries lin- 
early independent sets of vectors onto linearly independent sets of vectors. 


Theorem 2.3.20 Two finite-dimensional vector spaces are isomorphic if 
and only if they have the same dimension. 


A consequence of Theorem 2.3.20 is that all N-dimensional vector 
spaces over R are isomorphic to R™ and all complex N-dimensional vector 
spaces are isomorphic to C’. So, for all practical purposes, we have only 
two N-dimensional vector spaces, RY and CY. 

Suppose that V = V; @ V2 and that T is an automorphism of V which 
leaves V, invariant, i.e., T(V,;) = V,. Then T leaves V2 invariant as well. To 
see this, first note that if V= Vj @ V2 and V= V; © V5, then V2 = V},. This 
can be readily established by looking at a basis of V obtained by extending 
a basis of V;. Now note that since T(V) = V and T(V;) = Vj, we must have 


V1 @V2 =V=T(V) =T(V1 @ V2) = T(V1) @ T(V2) = Vi @ T(V2). 


Hence, by the argument above, T(V2) = V2. We summarize the discussion 
as follows: 


Proposition 2.3.21 Jf V = V, ® V2, then an automorphism of V which 
leaves one of the summands invariant leaves the other invariant as well. 


Example 2.3.22 (Another proof of the dimension theorem) Let T, V, and 
W be as in Theorem 2.3.13. Let T’ : V/ ker T > T(V) be a linear map defined 
as follows. If [a] is represented by |a), then T’([a]) = T|a). First, we have 
to show that this map is well defined, i.e., that if [a’] = [a], then T’([a’]) = 
T\a). But this is trivially true, because [a’] = [a] implies that |a’) = |a) +|z) 
with |z) € kerT. So, 


T ({a’]) =Tla’) =T (la) + |z)) =T(la)) + T( 


aq 
a 
Il 
=| 
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One can also easily show that T’ is linear. 

We now show that T’ is an isomorphism. Suppose that |x) € T(V). Then 
there is |y) € V such that |x) = T|y) = T’([y]). This shows that T’ is surjec- 
tive. To show that it is injective, let T’([y]) = T’([x]); then T|y) = T\x) or 
T(|y) — |x)) = 0. This shows that |y) — |x) € kerT, i-e., [y]] = [x]. This iso- 
morphism implies that dim(V/ ker T) = dim T(V). Equation (2.2) now yields 
the result of the dimension theorem. 


The result of the preceding example can be generalized as follows 


Theorem 2.3.23 Let V and W be vector spaces and T: V > W a linear 
map. Let U be a subspace of V. Define T’ : V/U— T(V) by T'([a]) = Tla), 
where |a) is assumed to represent [a]. Then TV’ is a well defined isomor- 
phism. 


Let U, V, and W be complex vector spaces. Consider the linear map 
T:(U@V) @W> (UEW)O(VE@W) 
given by 
T((|u) + |v)) ® |w)) = lu) ® [w) + |v) ® |w). 
It is trivial to show that T is an isomorphism. We thus have 
(US V) @WS=(USW) (VOW). (2.16) 
From the fact that dim(U ® V) = dimUdim V, we have 
USVEVEU. (2.17) 
Moreover, since dim C = | we have dim(C @ V) = dim V. Hence, 
C@V=EVOCEYV. (2.18) 
Similarly 
R@®VIEVEREV. (2.19) 


for a real vector space V. 


2.4 Complex Structures 


Thus far in our treatment of vector spaces, we have avoided changing the 
nature of scalars. When we declared that a vector space was complex, we 
kept the scalars of that vector space complex, and if we used real numbers 
in that vector space, they were treated as a subset of complex numbers. 

In this section, we explore the possibility of changing the scalars, and the 
corresponding changes in the other structures of the vector space that may 
ensue. The interesting case is changing the reals to complex numbers. 
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In the discussion of changing the scalars, as well as other formal treat- 
ments of other topics, it is convenient to generalize the concept of inner 
products. While the notion of positive definiteness is crucial for the physical 
applications of an inner product, for certain other considerations, it is too 
restrictive. So, we relax that requirement and define our inner product anew. 
However, except in this subsection, 


Box 2.4.1 Unless otherwise indicated, all complex inner products 
are assumed to be sesquilinear as in Definition 2.2.1. 


Definition 2.4.2 Let F be either C or R. An inner product on an F-linear 
space V isa map g: V x V > F with the following properties: 


(a) symmetry: g(la), |b)) = 8(Ib). |a)): 

(|x), ala) + B|b)) = ag (|x), |a)) + Bg (|x), [b)), 
g (ola) + Bld), |x)) = org(|a), 1x) + Bg (Id), |x): 
(|x),|a))=0 VixyeV = |a)=|0); 


(b) bilinearity: g 


(c) nondegeneracy: g 
with a, B € F and |a), |b), |x) € V. 

Non-degeneracy can be restated by saying that for any nonzero |a) € V, 
there is at least one vector |x) € V such that g(|x), |a)) 4 0. It is the state- 
ment of the fact that the only vector orthogonal to all vectors of an inner 
product space is the zero vector. 

Once again we use the Dirac bra and ket notation for the inner product. 
However, to distinguish it from the previous inner product, we subscript the 
notation with F. Thus the three properties in the definition above are denoted 
by 
a|b)g = (b\a)r; 


( 
(x|aa + Bb)p =a(x|a)p + B(x|b)R, 

(2.20) 
( 
( 


(a) symmetry: 
(b) bilinearity: 

aa + Bb\x)p = a(alx)p + B(b|x)p: 
(c) non-degeneracy: (x|a)r=0 V|x)eV => |a)=|0). 


Note that (|)r = (|) when F=R. 


Definition 2.4.3 The adjoint of an operator A € End(V), denoted by A’, is 
defined by 


(Aa|b)e = (a|A'b)p or (a|A"|b)e = (b/Ala)p. 
An operator A is called self-adjoint if A’ = A, and skew if A’ = —A. 
From this definition and the non-degeneracy of (|) it follows that 


(A')' =A. (2.21) 
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Proposition 2.4.4 An operator A € End(V) is skew iff (x|Ax)r = 
(x|A|x)p = 0 for all |x) €V. 


Proof Tf A is skew, then 
(x|Alx)e = (x|A" |x) =—(alAlx)p => (2/Alx)p =0. 


Conversely, suppose that (x|A|x)p = 0 for all |x) € V, then for nonzero 
a, B € F and nonzero |a), |b) € V, 


0= (aa+ Bb|Alaa + Bb)p 
= a” (a|Ala)g +08 (a|A|b)g + a8 (b|Ala)p + B° (b|A|b)E 
——— —— 
=0 =0 
= of ((b|Ala)z + (b|A"|a)p). 


Since a6 4 0, we must have (b|(A + A')|a)p =0 for all nonzero |a), |b) € V. 
By non-degeneracy of the inner product, (A+ A‘)|a) = |0). Since this is true 
for all |a) € V, we must have A' = —A. 


Comparing this proposition with Theorem 2.3.8 shows how strong a re- 
striction the positive definiteness imposes on the inner product. 


Definition 2.4.5 A complex structure J on a real vector space V is a linear 


operator which satisfies J7 = —1 and (Ja|Jb) = (a|b) for all |a), |b) € V. complex structure 


Proposition 2.4.6 The complex structure J is skew. 


Proof Let |a) € V and |b) = J\a). Then recalling that (|)p = (|), on the one 
hand, 

(a|Ja) = (a\|b) = (Ja|Jb) = (Ja|J°a) = —(Ja|a). 
On the other hand, 


(a\Ja) = (a|b) = (bla) = (Jala). 


These two equations show that (a|Ja) = 0 for all |a) € V. Hence, by Propo- 
sition 2.4.4, J is skew. 


Let |a) be any vector in the N-dimensional real inner product space. 
Normalize |a) to get the unit vector |e,). By Propositions 2.4.4 and 2.4.6, 
J\e,) is orthogonal to |e,). Normalize J|e;) to get jer). If N > 2, let |e3) be 
any unit vector orthogonal to |e;) and |e2). Then |a3) = J|e3) is obviously 
orthogonal to |e3). We claim that it is also orthogonal to both |e;) and |e2): 


(e;|a3) = (Je |Jaz) = es |J7e3) 
= —(Je;|e3) = —(e2|e3) =0 
(e2|a3) = (Je |Je3) = (e1|e3) = 0. 


Continuing this process, we can prove the following: 
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Theorem 2.4.7 The vectors {\e;), J\e;)}/_, with N = 2m form an orthonor- 
mal basis for the real vector space V with inner product (|)R = (|). In 
particular, V must be even-dimensional for it to have a complex structure J. 


Definition 2.4.8 If V is a real vector space, then C @ V, together with the 
complex multiplication rule 


a(B @|a))=(@B) @\a), a, BEC, 


is a complex vector space called the complexification of V and denoted 
by V°. In particular, (R")° =C@R"=C". 


Note that dimc V° = dimpV and dimp V© = 2dimp V. In fact, if 
{lax)}_, is a basis of V, then it is also a basis of V© as a complex vec- 
tor space, while {|ax), i|ax) ae is a basis of V© as a real vector space. 

After complexifying a real vector space V with inner product (|)rp = (|), 
we can define an inner product on it which is sesquilinear (or hermitian) as 
follows 


(a @ a|B ® b) =aBlalb). 


It is left to the reader to show that this inner product satisfies all the proper- 
ties given in Definition 2.2.1. 

To complexify a real vector space V, we have to “multiply” it by the set 
of complex numbers: V° = C @ V. As a result, we get a real vector space 
of twice the original dimension. Is there a reverse process, a “division” of a 
(necessarily even-dimensional) real vector space? That is, is there a way of 
getting a complex vector space of half complex dimension, starting with an 
even-dimensional real vector space? 

Let V be a 2m-dimensional real vector space. Let J be a complex structure 
on V, and {|e;), Jje;)}., a basis of V. On the subspace V; = Span{|e;)}/"__,, 
define the multiplication by a complex number by 


(@ + iB) @|v1) = (a1+ BJ)\v1), a, BER, |v) € V1. (2.22) 


It is straightforward to show that this process turns the 2-dimensional real 
vector space V into the m-dimensional complex vector space ve. 


2.5 Linear Functionals 


An important example of a linear transformation occurs when the second 
vector space, W, happens to be the set of scalars, C or R, in which case the 
linear transformation is called a linear functional. The set of linear func- 
tionals £(V, C)—or £(V, R) if V is a real vector space—is denoted by V* 
and is called the dual space of V. 


Example 2.5.1 Here are some examples of linear functionals: 


2.5 


(a) 


(b) 


(c) 


(d) 


(e) 


(f) 
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Let |a) = (a1, @2,...,@,) be in C”. Define @ : C” > C by 


Then it is easy to show that @ is a linear functional. 


Let jj; denote the elements of an m x n matrix M. Define @: 
mM™"*”" > C by 


@(M) = 2 > Hij- 


i=Lj=l 


Then it is easy to show that @ is a linear functional. 
Let j1;; denote the elements of ann x n matrix M. Define 0: M"*" > 
C by 


n 
0(M) = y HLjjs 
j=l 
the sum of the diagonal elements of M. Then it is routine to show that 
@ is a linear functional. 


Define the operator int : C°(a, b) > R by 
integration is a linear 


: b functional on the space 
int(f) = / f(t)dt. of continuous functions 


Then int is a linear functional on the vector space @°% (a, b). 
Let V be a complex inner product space. Fix |a) € V, and let y, : V> 
C be defined by 


Yq(1b)) = (alb). 


Then one can show that y,, is a linear functional. 


Let {|a1), |a2),.-., |am)} be an arbitrary finite set of vectors in V, and 
{@1, 62,---,@,,} an arbitrary set of linear functionals on V. Let 
m 
A=)° ax), € End(V) 
k=1 

be defined by 

m m 

Alx) = D7 lax) bx(I2)) = 2 O4( )) lax). 


k=1 k=1 


Then A is a linear operator on V. 


An example of linear isomorphism is that between a vector space and 


its dual space, which we discuss now. Consider an N-dimensional vector 


space with a basis B = {|a)), |a2),..., |an)}. For any given set of N scalars, 
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{a1, @2,..., ay}, define the linear functional ¢, by @,|a;) = a;. When $, 
acts on any arbitrary vector |b) = pe B;\a;) in V, the result is 


N N N 
by|b) = be (S:Aa0) =) Bi@alai) = >— Bia. (2.23) 
i=l i=l i=l 


This expression suggests that |b) can be represented as a column vector with 
entries B,, B2,..., By and @, as a row vector with entries a1, 02,...,ay. 
Then @,|b) is merely the matrix product!* of the row vector (on the left) 
and the column vector (on the right). 

¢, is uniquely determined by the set {a1,a@2,...,a@y}. In other words, 
corresponding to every set of N scalars there exists a unique linear func- 
tional. This leads us to a particular set of functionals, @,, @5, ..., @y corre- 
sponding, respectively, to the sets of scalars {1,0,0,..., 0}, {0, 1,0,..., O}, 


.,{0,0,0,..., 1}. This means that 
Every set of N scalars 


defines a linear @,la1)=1 and @ \a;)=0 for j ¥1, 
functional. . 
@yla2)=1 and gylaj)=0 for j #2, 


gylan)=1 and gylaj)=0 forjAN, 
or that 
$j |a;) = dij, (2.24) 


where 6;; is the Kronecker delta. 
The functionals of Eq. (2.24) form a basis of the dual space V*. To show 
this, consider an arbitrary y € V*, which is uniquely determined by its action 


on the vectors in a basis B = {|a;), |a2),..., |ay)}. Let yla;) = yj, € C. 
Then we claim that y = ee yi@;. In fact, consider an arbitrary vector |a) 
in V with components (a1, @2,...,@y) with respect to B. Then, on the one 
hand, 


N N 
yla)= r(Setd )=Yeriad= San 
i=l i=l 


On the other hand, 


N N N 
(>: vie = ( vs) (> cle) 
i=1 i=] j=l 
1 


N N 
iv do a9 ila;) = So Yast = Dove 
i= j=l 


i=l j=l 


'4Matrices will be taken up in Chap. 5. Here, we assume only a nodding familiarity with 
elementary matrix operations. 


2.5 Linear Functionals 


Since the actions of y and ee 1 %i%; yield equal results for arbitrary |a), 
we conclude that y = ue Vij, Le., {bi}, span V*. Thus, we have the 
following result. 

N 


Theorem 2.5.2 For every basis B = {laj)}ja1 in V, there corresponds a 


unique basis B* = {bi}, in V* with the property that $;|a;) = jj. 


By this theorem the dual space of an N-dimensional vector space is also 
N-dimensional, and thus isomorphic to it. The basis B* is called the dual 
basis of B. A corollary to Theorem 2.5.2 is that to every vector in V there 
corresponds a unique linear functional in V*. This can be seen by noting that 
every vector |a) is uniquely determined by its components (a1, a@2,...,@y) 
in a basis B. The unique linear functional @, corresponding to |a), also 
called the dual of |a), is simply yy ai@;, with d; € B*. 


Definition 2.5.3 An annihilator of |a) € V is a linear functional @ € V* 
such that @|a) = 0. Let W be a subspace of V. The set of linear functionals 
in V* that annihilate all vectors in W is denoted by W°. 


The reader may check that W° is a subspace of V*. Moreover, if we 
extend a basis {la;)}_, of W to a basis B = {|a;)};"_, of V, then we can 
show that the functionals {Vy 41> chosen from the basis Be= {$j} 1 
dual to B, span W°. It then follows that 


dim V = dimW + dim W°. (2.25) 


We shall have occasions to use annihilators later on when we discuss sym- 
plectic geometry. 

We have “dualed” a vector, a basis, and a complete vector space. The 
only object remaining is a linear transformation. 


Definition 2.5.4 Let T: V— U be a linear map. Define T* : U* > V* by! 
[T*(y)]la)=y(Tla)) Via)eV, yeu, 
T* is called the dual or pullback, of T. 


One can readily verify that T* € L(U*, V*), i.e., that T* is a linear oper- 
ator on U*. Some of the mapping properties of T* are tied to those of T. To 
see this we first consider the kernel of T*. Clearly, y is in the kernel of T* if 
and only if y annihilates all vectors of the form T|a), i.e., all vectors in T(V). 
It follows that y is in T(V)°. In particular, if T is surjective, T(V) = U, and y 
annihilates all vectors in U, i.e., it is the zero linear functional. We conclude 
that ker T* = 0, and therefore, T* is injective. Similarly, one can show that 
if T is injective, then T* is surjective. We summarize the discussion above: 


eK? 


'5Do not confuse this with complex conjugation. 


dual basis 


annihilator of a vector 
and a subspace 


dual, or pull back, of a 
linear transformation 
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Proposition 2.5.5 Let T be a linear transformation and T* its pull back. 
Then ker T* = T(V)°. If T is surjective (injective), then T* is injective (sur- 
jective). In particular, T* is an isomorphism if T is. 


It is useful to make a connection between the inner product and linear 
functionals. To do this, consider a basis {|a1), |a2),..., |an)} and let aj = 
(a|a;). As noted earlier, the set of scalars {aj} , defines a unique linear 
functional y,, (see Example 2.5.1) such that y,,|a;) = a;. Since (a|a;) is 
duals andinner products also equal to aj, it is natural to identify y, with the symbol (a|, and write 
Yat> (al. 
It is also convenient to introduce the notation!® 


(Ia))' = (al, (2.26) 


dagger of alinear Where the symbol + means “dual, or dagger of”. Now we ask: How does 
combination of vectors this dagger operation act on a linear combination of vectors? Let |c) = 
ala) + 6|b) and take the inner product of |c) with an arbitrary vector |x) 
using linearity in the second factor: (x|c) = a(x|a) + B(x|b). Now complex 
conjugate both sides and use the (sesqui)symmetry of the inner product: 
(LHS)* = (x|c)* = (clx), 
(RHS)* = a*(x|a)* + B*(x|b)* = a (alx) + B*(b|x) 
= (a* (a| + B*(b|) |x). 
Since this is true for all |x), we must have (|c))’ = (c| = a*(a| + B* (DI. 


Therefore, in a duality “operation” the complex scalars must be conjugated. 
So, we have 


(a|a) + Bib)’ =a*(al + B* (DI. (2.27) 


Thus, unlike the association |a) +» y, which is linear, the association y, > 
(a| is not linear, but sesquilinear: 


Y aa+Bb Fe a* (al =r B* (DI. 
It is convenient to represent |a) € C” as a column vector 
a} 
a2 
la) = 
an 


Then the definition of the complex inner product suggests that the dual of 
|a) must be represented as a row vector with complex conjugate entries: 


(al Sfey ae as 1), (2.28) 


'The significance of this notation will become clear in Sect. 4.3. 


2.6 Multilinear Maps 


and the inner product can be written as the (matrix) product 


Bi 
B n 
(a|b) = (af a ax) i = Sa Bi 
. i=1 
Bn 


2.6 Multilinear Maps 


There is a very useful generalization of the linear functionals that becomes 
essential in the treatment of tensors later in the book. However, a limited 
version of its application is used in the discussion of determinants, which 
we shall start here. 


Definition 2.6.1 Let V and U be vector spaces. Let V? denote the p-fold 
Cartesian product of V. A p-linear map from V to U is a map 6: V? > U 
which is linear with respect to each of its arguments: 


O(|a1),..., aaj) + Blbj),..., lap)) 
=a0(|a1),...,|aj),.-.,lap)) + BO(la1),..., bj), ---, lap). 


A p-linear map from V to C or R is called a p-linear function in V. 


As an example, let {@;} - be linear functionals on V. Define 6 by 


A(|a1),---,lap)) = $1 (la1))---@p(lap)), Iai) €V. 


Clearly 0 is p-linear. 
Let o denote a permutation of 1,2,..., p. Define the p-linear map ow 
by 


o@(|a1),.-., |ap)) = @(ldo(1)),--+s |do(py)) 


Definition 2.6.2 A p-linear map @ from V to U is skew-symmetric if o@ = 
€g - @, Le., if 


@(|do(1)), tees lao(p))) = €q@(|a1), oe) lap) 


where €, is the sign of o, which is +1 if o is even and —1 if it is odd. The 
set of p-linear skew-symmetric maps from V to U is denoted by A?(V, U). 
The set of p-linear skew-symmetric functions in V is denoted by A?(V). 


The permutation sign €, is sometimes written as 


€o = €6(1)o(2)...0(p) = €ijip...ip> (2.29) 


where if =a (k). 
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Compare (2.28) with the 
comments after (2.23). 
The complex 
conjugation in (2.28) is 
the result of the 
sesquilinearity of the 
association |a) <> (al. 


p-linear map 


p-linear function 


skew-symmetric p-linear 
map 
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Determinant function 
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Any p-linear map can be turned into a skew-symmetric p-linear map. In 
fact, if @ is a p-linear map, then 


o= SS én 10 (2.30) 


is skew-symmetric: 


oW=0 So ex -wO= So ex -(or)0= (€o)” ) ex -(o7)0 


=€o ) (Eo€x) (60)0 = €o Eon (O7)0 = yO, 


1s 


where we have used the fact that the sign of the product is the product of the 
signs of two permutations, and if }°, sums over all permutations, then so 


does >. 
The following theorem can be proved using properties of permutations: 


Theorem 2.6.3 Let w € A?(V, U). Then the following statements are equiv- 


alent: 

1. @(|a1),..-,|@p)) =0 whenever |a;) = |a;) for some pair i # j. 

2. @(Ido(1)), +++» |Ga(py)) = €c@(|41),.--, |Ap)), for any permutation o 
of 1,2,..., p, and any |a\),...,|a@p) in V. 

3. @(|a1),..., |@p)) = 0 whenever {lax)}P_, are linearly dependent. 


Proposition 2.6.4 Let N = dimV and w € AN (V, UW. Then w is deter- 
mined uniquely by its value on a basis of V. In particular, if @ vanishes 
on a basis, then wo = 0. 


Proof Let {lex)}e_, be a basis of V. Let {laj)¥uy be any set of vectors in V 


and write |aj) = )°p_, ajxlex) for j =1,...,N. Then 
N 
@(la1),...,lav)) = y 1k, AN Ky (lex), -- ++ leky)) 
ky...kn=1 
= Yo airy .+ ON (N)@(lEx(1))s +++ lexcny)) 
as 


= (x En Qin(1)-- cays) (ler. Sree lew)). 
T 


Since the term in parentheses is a constant, we are done. 


Definition 2.6.5 A skew symmetric N-linear function in V, i.e., a member 
of A (V) is called a determinant function in V. 


Let B= {lex)}e_, be a basis of V and B* = {eV a basis of V*, dual 
to B. For any set of N vectors (la in V, define the N-linear function 
0 by 


O(\ai),-..,law)) =€1(la1))-.-€w(law)), 


2.6 Multilinear Maps 


and note that 


m0(le1),..-,lew)) =O (lex), --- lercwy)) = Six, 


where J is the identity permutation and 6,, = lif =cand6,, =Oifa Au. 
Now let A be defined by A= )0,, €, -70. Then, by Eq. (2.30), A € AN(¥), 
i.e., A is a determinant function. Furthermore, 


A(\e1),....lew)) = >) €x-70(le1),...,lew)) = > xd = = 1 


Therefore, we have the following: 


Box 2.6.6 In every finite-dimensional vector space, there are deter- 
minant functions which are not identically zero. 


Proposition 2.6.7 Let w <¢ A (V, U). Let A be a fixed nonzero determinant 
function in V. Then @ determines a unique |ua) € U such that 


(|v1),...,|vw)) = A(|v1),..-, luw)) - wa). 


Proof Let {lu}, be a basis of V such that A(|v1),...,|vv)) #0. By 
dividing one of the vectors (or A) by a constant, we can assume that 
A({v1),---;|Un)) = 1. Denote @(|v1),...,|vx)) by |ua). Now note that 
@ — A-|ua) yields zero on the basis {lux)}i_,. By Proposition 2.6.4, it 
must be identically zero. 


Corollary 2.6.8 Let A be a fixed nonzero determinant function in V. Then 
every determinant function is a scalar multiple of A. 


Proof Let U be C or R in Proposition 2.6.7. 


Proposition 2.6.9 Let A be a determinant function in the N-dimensional 
vector space V. Let |v) and {lvn)}e_, be vectors in V. Then 


N 
Yi(-DITIA(|v), lu), «+5 [oj +--+ Lv) - [vj) = A(|v1), -... [vv)) + |v) 
j=1 


where a hat on a vector means that particular vector is missing. 


Proof See Problem 2.37. 


2.6.1 Determinant of a Linear Operator 


Let A be a linear operator on an N-dimensional vector space V. Choose a 
nonzero determinant function A. For a basis {|v;)}L , define the function 
AA by 


Aa(lvi),--.,luv)) = A(Alvi),..., Aluy)). (2.31) 
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determinant of an 
operator defined 
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Clearly, A, is also a determinant function. By Corollary 2.6.8, it is a mul- 
tiple of A. So, Ay = aA. Furthermore, it is independent of the nonzero 
determinant function chosen, because if A’ is another nonzero determinant 
function, then again by Corollary 2.6.8, A’ = 1.A, and 


A‘, =A, =ha A=’. 


This means that a is determined only by A, independent of the nonzero 
determinant function and the basis chosen. 


Definition 2.6.10 Let A < End(V). Let A be a nonzero determinant 
function in V, and let A, be as in Eq. (2.31). Then 


A, =detA-A (2.32) 


defines the determinant of A. 


Using Eq. (2.32), we have the following theorem whose proof is left as 
Problem 2.38: 


Theorem 2.6.11 The determinant of a linear operator A has the following 
properties: 


1. IfA=A1, then detA=2%. 
2. Ais invertible iff detA £0. 
3. det(Ao B) = detAdetB. 


2.6.2 Classical Adjoint 


Let V be an N-dimensional vector space, A a determinant function in V, 
and A € End(V). For |v), |v;) € V, define & : VY — End(V) by 


®(|v1),...,|vw))|v) 


N 
= Dele 1)/-!A(|v), Alu), -..,Alvj),...,Aluw)) - |v;). 


Clearly ® is skew-symmetric. Therefore, by Proposition 2.6.7, there is a 
unique linear operator—call it ad(A)—-such that 


@(|v1),...,]uv)) = A(lv1),..., luv)) - ad(A), 
1.€., 


N 
Sb A(|v), Alvi), ...,Alvj),..., Aluy)) - lvy) 


j=l 


= A(|v1),..., |uy)) -ad(A)|v). (2.33) 
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This equation shows that ad(A) is independent of the determinant function classical adjoint of an 
chosen, and is called the classical adjoint of A. operator 


Proposition 2.6.12 The classical adjoint satisfies the following relations: 
ad(A) o A= detA-1=Aoad(A) (2.34) 
where 1 is the unit operator. 


Proof Replace |v) with Alv) in Eq. (2.33) to obtain 


N 
S(-bs-TA(Alv), Alvi), ...,Alvj),...,Aloy)) - |vj) 
j=l 


= A(|v1),...,|vv)) ad(A) 0 Alv). 


Then, the left-hand side can be written as 


N 
LHS = detA- S>(—1)/7!A((v), Jui), .-- [0j),---s low) + Ley) 
j=l 


= detA- A(|v1),...,]uv))-|v), 


where the last equality follows from Proposition 2.6.9. Noting that |v) is 
arbitrary, the first equality of the proposition follows. 

To obtain the second equality, apply A to (2.33). Then by Proposi- 
tion 2.6.9, the left-hand side becomes 


N 
LHS = Yi 1/1 A(\v), Alvi), ...,Alu;),...,Alvy)) - Alv;) 
j=l 
= A(Alv1),...,Aluy)) + |v) =detA- A(|v1),...,|vv)) + |v), 
and the right-hand side becomes 
RHS = A(|v;),..., |v)) -Aoad(A)|v). 


Since the two sides hold for arbitrary |v), the second equality of the propo- 
sition follows. 


Corollary 2.6.13 /f det A 4 0, then A is invertible and 


_ 1 
A‘= ak -ad(A). 


2.7 Problems 


2.1 Let R* denote the set of positive real numbers. Define the “sum” of two 
elements of Rt to be their usual product, and define scalar multiplication by 
elements of R as being given by r - p= p’ wherer € R and p € R™. With 
these operations, show that R* is a vector space over R. 
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2.2 Show that the intersection of two subspaces is also a subspace. 


2.3 For each of the following subsets of IR? determine whether it is a 
subspace of R?: 


(a) {(x,y,z) €R3|x+ y—2z=0}; 


(b) {(x, y,z) € R3|x+ y —2z = 3}; 
(c) {(x, y, 2) € R3|xyz = 0}. 


2.4 Prove that the components of a vector in a given basis are unique. 


2.5 Show that the following vectors form a basis for C” (or R”). 


1 1 1 
1 1 

|a1) = : ’ |az) = : ’ i lan) = 
1 1 0 
1 0 0 


2.6 Prove Theorem 2.1.6. 


2.7 Let W be a subspace of R° defined by 
W= I Geisissieg 5) eR | x1 = 3x0 + X3,xX2 = X5, and x4 = 2x3}. 
Find a basis for W. 


2.8 Let U; and Uz be subspaces of V. Show that 


(a) dim(U; + U2) = dim U, + dim Uz — dim(U; N Uz). Hint: Let {|a;)}7"_, 
be a basis of Uy M Ug. Extend this to {{|a;)}""_,, {|bi)}K_ 1}, a basis 
for Uj, and to {{|a;)}",, {Ici} Ho} a basis for U2. Now show that 
{{lai) Vy, (Ibi) Hy, (ez) p21} is a basis for Uy + Ud. 

(b) If Uy +U2=V and dim U, + dim Uy = dimV, then V = U, © Un. 

(c) IfdimU, + dimU, > dimV, then U; N Uz $ {0}. 


2.9 Show that the vectors defined in Eq. (2.5) span W=U@ V. 
2.10 Show that the inner product of any vector with |0) is zero. 


2.11 Find ag, bo, b1,c9, c1, and cz such that the polynomials ao, bo + bit, 
and co + cit + ct? are mutually orthonormal in the interval [0, 1]. The inner 
product is as defined for polynomials in Example 2.2.3 with w(t) = 1. 


2.12 Given the linearly independent vectors x(t) = t”, forn =0,1,2,... in 
P*[t], use the Gram—Schmidt process to find the orthonormal polynomials 
eo(t), e1(t), and e2(t) 


(a) when the inner product is defined as (x|y) = Vie x*(t)y(t) dt. 


2.7 Problems 


(b) when the inner product is defined with a nontrivial weight function: 


Ce 2 
(xly) = i oP xt (yy(t) dt. 
—oo 
Hint: Use the following result: 


2 Ji ifn =0, 
if e! t'dt=30 if n is odd, 
Vr ee if n is even. 


2.13 (a) Use the Gram—Schmidt process to find an orthonormal set of vec- 
tors out of (1, —1, 1), (—1, 0, 1), and (2, —1, 2). 

(b) Are these three vectors linearly independent? If not, find a zero linear 
combination of them by using part (a). 


2.14 (a) Use the Gram—Schmidt process to find an orthonormal set of vec- 
tors out of (1, —1, 2), (—2, 1, —1), and (—1, —1, 4). 

(b) Are these three vectors linearly independent? If not, find a zero linear 
combination of them by using part (a). 


2.15 Show that 


ea 4 
i (1'°— 1° +. 514 —S)e dt 


lee) 
ce 4 2 Soft = 6 2A 
< (t4 —1)"e-* dt (t8+5)°e dt. 
—0o —0o 


Hint: Define an appropriate inner product and use the Schwarz inequality. 


2.16 Show that 

as = 4,4 

/ ax | dy (x — x3 42x? —2)(y> — y? + 2y? —2)e +y*) 
ae 55 
= Re 4,4 
<| ax | dy (x* — 2x? + 1)(y°+4y? + 4)e OO. 
—0o —oo 

Hint: Define an appropriate inner product and use the Schwarz inequality. 


2.17 Show that for any set of n complex numbers a1, a2, ...,@n, we have 
Jay Fan +--+ al” <n(larl? + ore)? +--+ + lanl”). 
Hint: Apply the Schwarz inequality to (1,1,..., 1) and (a1, a2,..., an). 


2.18 Using the Schwarz inequality show that if {a;}?°, and {6;}?°, are in 
C™, then }°?° | a* B; is convergent. 


2.19 Show that T : R* — R? given by T(x, y) = (x7 + y?,x + y, 2x — y) is 
not a linear mapping. 
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2.20 Verify that all the transformations of Example 2.3.5 are linear. 


2.21 Let z be the permutation that takes (1,2, 3) to (3, 1, 2). Find 
A;le;), i=1,2,3, 


where {le:)}3_, is the standard basis of R? (or C3), and A, is as defined in 
Example 2.3.5. 


2.22 Show that if T <¢ £(C, C), then there exists wa € C such that T|a) = 
aja) for all |a) EC. 


2.23 Show that if {|a;)}'_, spans V and T € £(V, W), then {T|a;)}/'_, spans 
T(Y). In particular, if T is surjective, then {T|a;)}""_, spans W. 


2.24 Give an example of a function f : R? — R such that 
f(ala)) =ef(la)) Va €R and Ja) € R* 
but f is not linear. Hint: Consider a homogeneous function of degree 1. 


2.25 Show that the following transformations are linear: 


(a) Vis C over the reals and C|z) = |z*). Is C linear if instead of real 
numbers, complex numbers are used as scalars? 
(b) Vis P°[t] and T|x(t)) = |x(t+ 1)) — |x). 


2.26 Verify that the kernel of a transformation T : V > W is a subspace of 
V, and that T(V) is a subspace of W. 


2.27 Let V and W be finite dimensional vector spaces. Show that if T € 
£(V, W) is surjective, then dim W < dimV. 


2.28 Suppose that V is finite dimensional and T € £(V, W) is not zero. 
Prove that there exists a subspace U of V such that kerT MN U = {0} and 
TV) =T(U). 

2.29 Using Theorem 2.3.11, prove Theorem 2.3.18. 

2.30 Using Theorem 2.3.11, prove Theorem 2.3.19. 


2.31 Let By = {|a;)}/_, bea basis for V and By = {|b;)}®_, a basis for W. 


i=l 
Define the linear transformation T|a;) = |b;), i = 1,2,..., N. Now prove 
Theorem 2.3.20 by showing that T is an isomorphism. 


2.32 Show that (A')' = A for the adjoint given in Definition 2.4.3. 


2.33 Show that W® is a subspace of V* and 


dimV = dimW + dim Ww". 


2.7 Problems 


2.34 Show that every vector in the N-dimensional vector space V* has 
N — 1 linearly independent annihilators. Stated differently, show that a lin- 
ear functional maps N — | linearly independent vectors to zero. 


2.35 Show that T and T* have the same rank. In particular, show that if T is 
injective, then T* is surjective. Hint: Use the dimension theorem for T and 
T* and Eq. (2.25). 


2.36 Prove Theorem 2.6.3. 


2.37 Prove Proposition 2.6.9. Hint: First show that you get zero on both 
sides if {|vx)} he are linearly dependent. Next assume their linear indepen- 
dence and choose them as a basis, write |v) in terms of them, and note that 


A(|v), |v1),.--,[vj),---.luw)) =0 
unless i = j. 


2.38 Prove Theorem 2.6.11. Hint: For the second part of the theorem, use 
the fact that an invertible A maps linearly independent sets of vectors onto 
linearly independent sets. 
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Algebras 


In many physical applications, a vector space V has a natural “product”, 
i.e., a binary operation V x V > V, which we call multiplication. The prime 
example of such a vector space is the vector space of matrices. It is therefore 
useful to consider vector spaces for which such a product exists. 


3.1 From Vector Space to Algebra 


In this section, we define an algebra, give some familiar examples of alge- 
bras, and discuss some of their basic properties. 


Definition 3.1.1 An algebra A over C (or R) is a vector space over C 
(or R), together with a binary operation A x A — A, called multiplica- 
tion. The image of (a,b) € A x A under this mapping! is denoted by ab, 
and it satisfies the following two relations 


a(Bb + yc) = Bab+ yac 
(6b+ yeja= Bbha+ yea 


for all a,b,c € A and 6, y € C (or R). The dimension of the vector space 
is called the dimension of the algebra. The algebra is called associative if 
the product satisfies a(bc) = (ab)c and commutative if it satisfies ab = ba. 
An algebra with identity is an algebra that has an element 1 satisfying al = 
la =a. An element b of an algebra with identity is said to be a left inverse 
of a if ba = 1. Right inverse is defined similarly. The identity is also called 
unit, and an algebra with identity is also called a unital algebra. 


It is sometimes necessary to use a different notation for the identity of an 
algebra. This happens especially when we are discussing several algebras at 
the same time. A common notation other than 1 is e. 


'We shall, for the most part, abandon the Dirac bra-and-ket notation in this chapter due 
to its clumsiness; instead we use boldface roman letters to denote vectors. 
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3.1.1 General Properties 


Taking 8 = 1 = —y and b= c in the definition above leads immediately to 
a0=0a=0 Vac. 


The identity of an algebra is unique. If there were two identities 1 and e, 
then le = e, because 1 is the identity, and le = 1, because e is the identity. 

If A is an associative algebra and a € A has both a left inverse b and a 
right inverse c, then the two are equal: 


bac = (ba)c = lce=c, 


bac = b(ac) = b1 =b. 


Therefore, in an associative algebra, we talk of an inverse without specifying 
right or left. Furthermore, it is trivial to show that the (two-sided) inverse is 
unique. Hence, we have 


Theorem 3.1.2 Let A be an associative algebra with identity. Ifa € A has a 
right and a left inverse, then they are equal and this single inverse is unique. 
We denote it by all. Ifaand b are invertible, then ab is also invertible, and 


(ab)! =bo!a7!. 
The proof of the last statement is straightforward. 


Definition 3.1.3 Let A be an algebra and A’ a linear subspace of A. If A’ 
is closed under multiplication, i.e., if ab € A’ whenever ae A’ andbe JA’, 
then A’ is called a subalgebra of A. 


Clearly, a subalgebra of an associative (commutative) algebra is also as- 
sociative (commutative). 

Let A be an associative algebra and S a subset of A. The subalgebra 
generated by S is the collection of all linear combinations of 


$182...S;, Ss, ES. 


If S consists of a single element s, then the subalgebra generated by s is the 
set of polynomials in s. 


Example 3.1.4 Let A be a unital algebra, then the vector space 
Span{1} = {a1 | a €C} 


is a subalgebra of A. Since Span{1} is indistinguishable from C, we some- 
times say that C is a subalgebra of A. 


Definition 3.1.5 Let A be an algebra. The set of elements of A which com- 
mute with all elements of A is called the center of A and denoted by Z(A). 


3.1. From Vector Space to Algebra 


Table 3.1 The multiplication table for 8 


(1) e| e2 e3 
€0 €0 e e2 e3 
e) e) eo e3 eo 
e2 e2 —€3 —eo e} 
e3 e3 —e2 —e} €0 


Z(A) is easily shown to be a subspace of A, and if A is associative, then 
Z(A) is a subalgebra of A. 


Definition 3.1.6 A unital algebra A is called central if 2(A) = Span{1}. 
Example 3.1.7 Consider the algebra 5 with basis {ei}; 9 and multiplica- 
tion table given in Table 3.1, where for purely aesthetic reasons the identity 
has been denoted by eo. 

We want to see which elements belong to the center. Let a € Z(8). Then 
for any arbitrary element b € 8, we must have ab = ba. Let 


3 3 
a=) ae; and b=) Bie. 
i=0 i=0 


Then a straightforward calculation shows that 


ab = (ao + a1 Bi — a2 2 + 03B3)e0 
+ (a0B1 + a1 Bo + &283 — a3 82)e1 
+ (a0 B2 + 01 B3 + a2 Bo — 0381 )e2 
+ (ao B3 + a1 By — a2; + a3 Bo)es, 


with a similar expression for ba, in which as and fs are switched. It is easy 
to show that the two expressions are equal if and only if 


0263 = 0382 and a)p3=a3f}. 


This can hold for arbitrary b only if @; = a2 = a3 = 0, with ag arbitrary. 
Therefore, a € Z(S) if and only if a is a multiple of eg, i-e., if and only if 
a € Span{ep}. Therefore, S is central. 


Let A and B be subsets of an algebra A. We denote by AB the set of 
elements in A which can be written as the sum of products of an element 
in A by an element in B: 


ABs |xcAlx= aby ae Abe Bl. 3.1) 
k 
In particular, 


Am {xe Ax= Daub adel (3.2) 
k 
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is called the derived algebra of A. 


Definition 3.1.8 Given any algebra A, in which (a, b) + ab, we can obtain 
a second algebra A°? in which (a, b) > ba. We write 


(ab)°? = ba 


and call A°? the algebra opposite to A. 


It is obvious that if A is associative, so is A°?, and if A is commutative, 


then A°? = A. 


Example 3.1.9 Here are some examples of algebra: 


Define the following product on R?: 


(X1, ¥2) (91, Y2) = (1 yt — X22, X1y2 + x2y1). 


The reader is urged to verify that this product turns R* into a commuta- 
tive algebra. 

Similarly, the vector (cross) product on R? turns it into a nonassociative, 
noncommutative algebra. 

The paradigm of all algebras is the matrix algebra whose binary oper- 
ation is ordinary multiplication of n x n matrices. This algebra is asso- 
ciative but not commutative. 

Let A be the set of n x n matrices. Define the binary operation, denoted 
by e, as 


AeB=AB-BA, (3.3) 


where the RHS is ordinary matrix multiplication. The reader may check 
that A together with this operation becomes a nonassociative, noncom- 
mutative algebra. 

Let A be the set of n x n upper triangular matrices, i.e., matrices all of 
whose elements below the diagonal are zero. With ordinary matrix mul- 
tiplication, this set turns into an associative, noncommutative algebra, 
as the reader can verify. 

Let A be the set of n x n upper triangular matrices. Define the binary 
operation as in Eq. (3.3). The reader may check that A together with 
this operation becomes a nonassociative, noncommutative algebra. The 
derived algebra A? of A is the set of n x n strictly upper triangular 
matrices, i.€., upper triangular matrices whose diagonal elements are 
all zero. 

We have already established that the set of linear transformations 
L(V, W) from V to W is a vector space. Let us attempt to define a 
multiplication as well. The best candidate is the composition of linear 
transformations. If T: V— U and S$: U— W are linear operators, then 
the composition S$ oT: V — W is also a linear operator, as can easily be 
verified. This product, however, is not defined on a single vector space, 
but is such that it takes an element in L(V, U) and another element in 


3.1 


From Vector Space to Algebra 


a second vector space L(U, W) to give an element in yet another vec- 
tor space L(V, W). An algebra requires a single vector space. We can 
accomplish this by letting V = U = W. Then the three spaces of linear 
transformations collapse to the single space L(V, V), the set of endo- 
morphisms of V, which we have abbreviated as £(V) or End(V) and to 
which T, S$, ST=S oT, and TS =T 0S belong. 

All the examples above are finite-dimensional algebras. An example 
of an infinite-dimensional algebra is C’ (a, b), the vector space of real- 
valued functions defined on a real interval (a, b), which have derivatives 
up to order r. The multiplication is defined pointwise: If f € C’(a, b) 
and g € C’(a, b), then 


(fHO=fOst) Vte (a,b). 


This algebra is commutative and associative, and has the identity ele- 
ment f(t) =1. 

Another example of an infinite dimensional algebra is the algebra of 
polynomials.” This algebra is a commutative and associative algebra 
with identity. 


Definition 3.1.10 Let A and B be algebras. Then the vector direct sum 
A ® B becomes an algebra direct sum if we define the following product 


(a; ® b}) (a2 @ bz) = (ajaz @ bj bo) 


onA@® 8. 


Note that if an element a is in A, then it can be represented by a® 0 


as an element of A @ 8B. Similarly, an element b in B can be represented 
by 0@ b. Thus the product of any element in A with any element in B is 
zero, i.e., AB = BA = {0}. As we shall see later, this condition becomes 
necessary if a given algebra is to be the direct sum of its subalgebras. 


or 


In order for a © b to be in the center of A © B, we must have 


(adb)x Gy) = (xP y)(a®b), 


ax @ by=xa@®yb or (ax—xa) © (by — yb) = 0, 


for all x € A and y € B. For this to hold, we must have 


ax—xa=0 and by—yb=0, 


ie., that ae Z(A) and b € Z(B). Hence, 


Z(A ®@ B) = ZA) © Z(B). (3.4) 


algebra direct sum 
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Definition 3.1.11 Let A and 8 be algebras. Then the vector space tensor algebra tensor product 


?It should be clear that the algebra of polynomials cannot be finite dimensional. 
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product A © B becomes an algebra tensor product if we define the product 
(a; @ b)(az @ bz) = aj az @ bi by 


on A @ B. Because of the isomorphism, A ® B = B @ A, we demand that 
a@b=b@aforallacA andbe B. 


The last condition of the definition becomes an important requirement 
when we write a given algebra A as the tensor product of two of its subal- 
gebras B and C. In such a case, ® coincides with the multiplication in A, 
and the condition becomes the requirement that all elements of B commute 
with all elements of C, i.e., BC = CB. 


Definition 3.1.12 Given an algebra A and a basis B = {e}N , for the un- 
derlying vector space, one can write 


N 
eej=) che, ci, eC. (3.5) 
k=1 


The complex numbers e A the components of the vector e;e; in the basis B, 
are called the structure constants of A. 


The structure constants determine the product of any two vectors once 
they are expressed in terms of the basis vectors of B. Conversely, given any 
N-dimensional vector space V, one can turn it into an algebra by choosing a 
basis and a set of N* numbers {cf} and defining the product of basis vectors 
by Eq. (3.5). 


Eeample 3.1.13 Let the structure constants in alsenias A and B be 


{ay} rit and Dale, mal in their bases {e;}"4, and {f}N_ 1 respectively. 
So that 
M 
eje; = > aj jek and finf, = => > bl nfl. 
i,j=l m,n=1 


Construct ae MN dimensional algebra € by defining its structure constants 


kl _ : M,N 
aS Cin, jn = 4G; ae in a basis {Vibe i=1> so that 


M N M N 
Ls y » kl = ) ) k pl 
VimV jn = Cim, jn VKL _ Gi) Onn VkI- 


i,j=lm,n=1 i,j=lm,n=1 


This algebra is isomorphic to the algebra A @ B. In fact, if we identify vz; 
on the right-hand side as ex ® f/, then 


M N 
= kL = 
tintin= hate Oh= TY abithyee On 


i,j=l1m,n=1 i,j=lm,n=1 


M 
= (> a; hale (> > bf ) = (e;e;) ® (finfn), 


i,j=l m,n=1 
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which is consistent with vj =e; @ fp and Vj, =e; ®@f,, and the rule of 
multiplication of the tensor product of two algebras. 


Definition 3.1.14 A unital algebra all of whose nonzero elements have in- 
verses is called a division algebra. division algebra 


Example 3.1.15 Let {e;,e2} be a basis of R2. Let the structure constants 


be 
1 1 2 2 
C4 = Cp) = C}n = C9, = 1 
1 1 2 2 
C12 = —CQy = CY = CQ) =O, 
1.e., let 
De fz = 
ey = —-e) = 1, ejeo9 = €7e] = Cp. 


Then, it is easy to prove that the algebra so constructed is just C. All that 
needs to be done is to identify e; with 1 and e) with /—1. Clearly, C is a 
division algebra. 


Example 3.1.16 In the standard basis {ei}*_o of R*, choose the structure 
constants as follows: 


2 2 2 2 


ene; =ej;en9 =e; fori =1,2,3, 


3 
ee; = > €ijnek fori, 7=1,2,3, i4j, 
k=1 


where €; jx is completely antisymmetric in all its indices (therefore vanishing 
if any two of its indices are equal) and €123 = 1. The reader may verify that 
these relations turn R* into an associative, but noncommutative, algebra. 
This algebra is called the algebra of quaternions and denoted by H. In 
this context, eg is usually denoted by 1, and ej, eo, and e3 by i, j, and k, 
respectively, and one writes g =x +iy + jz+kw for an element of H. It 
then becomes evident that H is a generalization of C. In analogy with C, x 
is called the real part of g, and (y, z, w) the pure part of qg. Similarly, the 
conjugate of g is g* =x —iy— jz—kw. 

It is convenient to write g = x9 +x, where x is a three-dimensional vector. 
Then g* = xo — x. Furthermore, one can show that, with g = x9 + x and 


algebra of quaternions 


P=yoty, 
gp = x0yo —X-Y+xXoy + yoX+x xy. (3.6) 
—— —-—__, 
real part of gp pure part of gp 


Changing x to —x and y to —y in the expression above, one gets 
q" p* = Xoyo —X-Y— XY — WOX+X xy, 


which is not equal to (q¢p)*. However, it is easy to show that (gp)* = p*q*. 
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Substituting g* for p in (3.6), we get qq* = x + |x|?. The absolute 
value of g, denoted by |g| is—similar to the absolute value of a complex 
number—given by |q| = /@q*. If q #0, then q*/(xé + |x|”) is the inverse 
of q. Thus, the algebra of quaternions is a division algebra. 

It is not hard to show that 


[n]/2 a [n]/2 cs 
q" = (—1)" (jo + > (<1 ( in a Ix|**x, 


k=0 k=0 
(3.7) 
where [n] = n if n is even and [n] =n — 1 if n is odd. 


In order for a ® b to be in the center of A @ 8B, we must have 


(a®@b)(x @y) = (x@y)(a@b), 
or 
ax © by = xa ® yb 


for all x € A and y € 8. For this to hold, we must have 
ax=xa and by=yb, 
Le., that a € Z(A) and b € Z(B). Hence, 
Z(A @ B) = ZA) @ ZB). (3.8) 


Let A be an associative algebra. A subset S C A is called the genera- 
tor of A if every element of A can be expressed as a linear combination 
of the products of elements in S. A basis of the vector space A is clearly a 
generator of A. However, it is not the smallest generator, because it may be 
possible to obtain the entire basis vectors by multiplying a subset of them. 
For example, (R3, x), the algebra of vectors under cross product, has the 
basis {€,, €y, €,}, but {@,, €y}—or any other pair of unit vectors—is a gen- 
erator because @, = é, x @. 


3.1.2. Homomorphisms 


The linear transformations connecting vector spaces can be modified 
slightly to accommodate the binary operation of multiplication of the corre- 
sponding algebras: 


Definition 3.1.17 Let A and 8 be algebras. A linear map? ¢: A > B 
is called an algebra homomorphism if (ab) = ¢(a)¢(b). An injec- 
tive, surjective, or bijective algebra homomorphism is called, respectively, 
a monomorphism, an epimorphism, or an isomorphism. An isomorphism 
of an algebra onto itself is called an automorphism. 


3It is more common to use ¢, yy etc. instead of T, U, etc. for linear maps of algebras. 
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Example 3.1.18 Let A be R?, and 8 the set of 3 x 3 matrices of the form 


0 a —a2 
A= |-—-a, 0 a3 
a —a3 0 


Then the map ¢: A > 8B defined by 


0 aj —a2 
o(a) = o(a1, 42,43) =| —a, 0 a 
a2 —a3 0 


can be shown to be a linear isomorphism. Let the cross product be the binary 
operation on A, turning it into an algebra. For B, define the binary operation 
of Eq. (3.3). The reader may check that, with these operations, @ is extended 
to an algebra isomorphism. 


Proposition 3.1.19 Let A and B be algebras. Let {e;} be a basis of A and 
@:A— Ba linear transformation. Then ¢ is an algebra homomorphism if 
and only if 


p(eiej) = p(ei/)P Ej). 


Proof Ifa= 7; aje; and b= >); Bje;, then 
(ab) = (5 7) 2 7) = (Da Yes) 
i J i J 
=) a; >) Bjb(eie;) = >a; D- Bj (€:)b(e;) 
i j i J 
= aioe Y 60) =0(Tae)o(D Ares) 
i i i j 


= o(a)o(b). 


The converse is trivial. 


Example 3.1.20 Let A and B be algebras and ¢: A — B a homomor- 
phism. Theorem 2.3.10 ensures that (A) is a subspace of B. Now let 
b,, bo € d(A). Then there exist aj,a2 € A such that b; = ¢(a;) and 
bz = ¢(a2). Furthermore, 


bi bz = $(a1)o(a2) = o(aja2), = byibo € P(A). 


Hence, ¢(A) is a subalgebra of B. 


Example 3.1.21 Let A be a real algebra with identity 1. Let¢: R— Abe R (or C) is a subalgebra 
the linear mapping given by ¢(@) = a1. Considering R as an algebra over of any unital algebra. 
itself, we have 


b(aB) = a1 = (a1) (B1) = (a) (B). 
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This shows that @ is an algebra homomorphism. Furthermore, 


o(aj)=¢(a2) > al=al > (aj-a2)1=0 > ay=ay. 


Hence, ¢ is a monomorphism. Therefore, we can identify R with @(R), and 
consider R as a subalgebra of A. This is the same conclusion we arrived at 
in Example 3.1.4. 


Definition 3.1.22 Let A and B be unital algebras. A homomorphism ¢ : 
A — 8B is called unital if ¢(14) = 1p. 


One can show the following: 


Proposition 3.1.23 Let A and 8 be unital algebras. If 6: A — B is an 
epimorphism, then @ is unital. 


Example 3.1.9 introduced the algebra £(V) of endomorphisms (opera- 
tors) on V. This algebra has an identity 1 which maps every vector to itself. 


Definition 3.1.24 An endomorphism w of V whose square is 1 is called an 
involution. 


In particular, 1 € End(V) is an involution. If @; and w2 are involutions 
such that w1 0 @2 = w2 0 w1, then @1 0 w? 1s also an involution. 

For an algebra, we require that an involution be a homomorphism, not 
just a linear map. Let A be an algebra and let 3{(A) denote the set of homo- 
morphisms of A. An involution w € H(A) satisfies oom =1 € H(A), of 
course.* Now, if A has an identity e, then w(e) must be equal to e. Indeed, 
let w(e) = a, then, since wo w = 1, we must have w(a) = e and 


w(ea) = w(e)w(a) = w(e)e = w(e) 
applying w to both sides, we get ea = e. This can happen only if a= e. 


Theorem 3.1.25 Let U and V be two isomorphic vector spaces. Then the 
algebras £(U) and £(V) are isomorphic as algebras. 


Proof Let ¢: U— V be a vector-space isomorphism. Define ® : £(U) > 
£(V) by 


@(T)=go0Tod!. 


It is easy to show that @ is an algebra isomorphism. 


A consequence of this theorem and Theorem 2.3.20 is that £(V), the al- 
gebra of the linear transformations of any real vector space V, is isomorphic 
to £(R™), where N is the dimension of V. Similarly, £(V) is isomorphic to 
£(C%) if V is an N-dimensional complex vector space. 


“Tn keeping with our notation, we use i for the identity homomorphism of the algebra A. 


3.2 Ideals 
3.2 Ideals 


Subalgebras are subspaces which are stable under multiplication of their 
elements; i.e., the product of elements of a subalgebra do not leave the sub- 
algebra. Of more importance in algebra theory are those subspaces which 
are stable under multiplication of its elements by the entire algebra. 


Definition 3.2.1 Let A be an algebra. A subspace B of A is called a left 
ideal of A if it contains ab for all a ¢ A and b € B. Using Eq. (3.1), we 
write this as AB C B. A right ideal is defined similarly with BA C B.A 
two-sided ideal, or simply an ideal, is a subspace that is both a left ideal 
and a right ideal. 


It is clear from the definition that an ideal is automatically a subalge- 
bra, and that the only ideal of a unital algebra containing the identity, or an 
invertible element, is the algebra itself. 


Example 3.2.2 Let A be an associative algebra and a € A. Let £(a) be the 
set of elements x € A such that xa = 0. For any x € £(a) and any y € A, we 
have 


(yx)a = y(xa) = 0, 


ie., yx € £(a). So, £(a) is a left ideal in A. It is called the left annihilator 
of a. Similarly, one can construct R(a), the right annihilator of a. 


Example 3.2.3 Let C’ (a, b) be the algebra of all r times differentiable real- 
valued functions on an interval (a, b) (see Example 3.1.9). The set of func- 
tions that vanish at a given fixed point c € (a,b) constitutes an ideal in 
C’ (a, b). Since the algebra is commutative, the ideal is two-sided. 

More generally, let 1, be the (noncommutative) algebra of matrices with 
entries fj; € C’(a, b). Then the set of matrices whose entries vanish at a 
given fixed point c € (a, b) constitutes a two-sided ideal in M,,. 


Let A and B be algebras and ¢: A ~ B a homomorphism. By Theo- 
rem 2.3.9, ker@ is a subspace of A. Now let x € ker¢@ and aé A. Then 


(xa) = $(x)p(a) = 0¢(a) = 9, 


ie., xa € ker@. This shows that ker @ is a right ideal in A. Similarly, one can 
show that ker @ is a left ideal in A. 


Theorem 3.2.4 Let ¢ : A — 8B be a homomorphism of algebras. 
Then ker @ is a (two-sided) ideal of A. 


One can easily construct left ideals for an associative algebra A: Take 
any element x € A and consider the set 


Ax = {ax |ae A}. 
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The reader may check that Ax is a left ideal. Similarly, xA is a right ideal, 
and the set 


AXxA = {axb | a, b € A} 
is a two-sided ideal. These are all called left, right, and two-sided ideals 


generated by x. 


Definition 3.2.5 A left (right, two-sided) ideal M of an algebra A is called 
minimal if every left (right, two-sided) ideal of A contained in M coincides 
with M. 


Theorem 3.2.6 Let £ be a left ideal of A. Then the following statements 
are equivalent: 


(a) Lis aminimal left ideal. 
(b) Ax=Z forallxe£. 
(c) Lx=L forallxeL£. 


Similar conditions hold for a minimal right ideal. 


Proof The proof follows directly from the definition of ideals and minimal 
ideals. 


Theorem 3.2.7 Let A and B be algebras, ¢ : A > B an epimorphism, and 
£ a (minimal) left ideal of A. Then @(£) is a (minimal) left ideal of B. In 
particular, any automorphism of an algebra is an isomorphism among its 
minimal ideals. 


Proof Let b be any element of B and y any element of (4). Then there 
exist elements a and x of A and Z, respectively, such that b = ¢(a) and 
y = ¢(x). Furthermore, 


by = $(a) (x) = ¢(ax) € o(L) 


because ax € £. Hence, #(£) is an ideal in B. 

Now suppose £ is minimal. To show that @(£) is minimal, we use (b) of 
Theorem 3.2.6. Since @ is an epimorphism, we have B = ¢(A). Therefore, 
let u€ (ZL). Then there exists t € £ such that u= ¢(t) and 


Bu = $(A)b(t) = (At) = (4). 


The last statement of the theorem follows from the fact that ker@ is an 
ideal of A. 


Definition 3.2.8 A is the direct sum of its subalgebras B and C if A=BO@ 
C as a vector space and BC = CB = {0}. B and €@ are called components 
of A. Obviously, an algebra can have several components. An algebra is 
called reducible if it is the direct sum of subalgebras. 


3.2 Ideals 


Table 3.2. The multiplication table for 8 


f; fo f; f4 
f; f; fo 0 0 
f 0 (0) fy f; 
f; 0 0 f3 f4 
fy fy f (0) 0 


As we saw in Definition 3.1.10, the condition BC = CB = {0} is nec- 
essary if B and C are to be naturally identified as B @ {0} and C © {0}, 
respectively. 


Proposition 3.2.9 A central algebra is not reducible. 


Proof Suppose that the (necessarily unital) central algebra A is reducible. 
Then the identity has components in each of the subalgebras of which A 
is composed. Clearly, these components are linearly independent and all 
belong to the center. This is a contradiction. 


Example 3.2.10 Consider S, the algebra introduced in Example 3.1.7. Con- 
struct a new basis {f; he 1 as follows: 


1 1 
f; = =(e9 + €3), fp = =(e; — e2), 

2 2 

1 i (3.9) 
f3 = 7 (eo — €3), fy = 5(e1 + 2). 


The multiplication table for S in terms of the new basis vectors is given in 
Table 3.2, as the reader may verify. 

Multiplying both sides of the identity eg = f; + f3 by an arbitrary element 
of S, we see that any such element can be written as a vector in the left ideal 
£, = Sf, plus a vector in the left ideal £3 = Sf3. Any vector in £; can be 
written as a product of some vector in S and f,. Let a= ear a;f; be an 
arbitrary element of 5. Then any vector in £; is of the form 


af, = (aif) + aof2 + 03f3 + a4f4)f) = of) + o4fs, 


ie., that fj and f, span £,. Similarly, f2 and f; span £3. It follows that 
£,1£3 = {0}. Therefore, we have 


$=) Gy 43, 4 = Span{f,, f4}, 4&3 = Span{f2, f3}, 
where @y indicates a vector space direct sum. Note that there is no contra- 


diction between this direct sum decomposition and the fact that S is central 
because the direct sum above is not an algebra direct sum since £143 4 {0}. 


5The reader is advised to show that {f; pa , is a linearly independent set of vectors. 
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Let x = yf; + y4fq be an arbitrary nonzero element of £;. Then clearly, 
8x C £1. To show that £; C Sx, let y = 6) f; + Bafa be in £;. Can we find 
z€ 5 such that y = zx? Let z= eee nif; and note that 


zx = (nif; + nofs + n3f3 + nafs) (if + yafs) 


= (mii + n2ya)fi + (nsyv4 + nay) fa. 


We are looking for a set of 7’s satisfying 


myitnoya=fi and 3yv4t+ navi = Ba. 


If y; £0, then yn; = 61/1, n2 =0 = 3, na = Ba/V1 yields a solution for z. 
If v4 £0, then n2 = 61/4, 11 = 0 = na, 03 = Ba/ya yields a solution for z. 
Therefore, £; = Sx, and by Theorem 3.2.6, £; is minimal. Similarly, £3 is 
also minimal. 


If A= BO €, then multiplying both sides on the right by B, we get 
AB=BBOCB=BBO {(0O}=BBCB, 


showing that B is a left ideal of A. Likewise, multiplying on the left leads 
to the fact that B is a right ideal of A. Thus it is an ideal of A. Similarly, C 
is an ideal of A. Moreover, since the subalgebras do not share any nonzero 
elements, any other ideal of A must be contained in the subalgebras. We 
thus have 


Proposition 3.2.11 Jf A is the direct sum of algebras, then each component 
(or the direct sum of several components) is an ideal of A. Furthermore, any 
other ideal of A is contained entirely in one of the components. 


Algebras which have no proper ideals are important in the classification 
of all algebras. 


Definition 3.2.12 An algebra A is called simple if its only ideals are 
A and {0}. 


Recall that by ideal we mean two-sided ideal. Therefore, a simple algebra 
can have proper left ideals and proper right ideals. In fact, the following 
example illustrates this point. 


Example 3.2.13 Let’s go back to algebra S of Example 3.2.10, where we 
saw that 8 = £1; ®y £3 in which £, and £3 are minimal left ideals and @y 
indicates direct sum of vector spaces. Can S have a proper two-sided ideal? 
Let J be such an ideal and let a € J be nonzero. By the decomposition of 
5,a=a, + a3 with a; € £L; and a3 € £3, at least one of which must be 
nonzero. Suppose a; # 0. Then Sa; is a nonzero left ideal which is con- 
tained in £1. Since £; is minimal, Sa; = £1. Since f; € £1 there must exist 


3.2 Ideals 


b €S such that ba; =f), and hence, 
ba = ba, + ba; =f, + ba;. 


Multiplying both sides on the right by f; and noting that iL =f, and £3f, = 
{0} by the multiplication table of Example 3.2.10, we obtain baf; = f1. 
Since J is a two-sided ideal and a € J, baf, € J, and therefore, f; € J. 

The equality Sa; = £1, also implies that there exists ¢ € S such that 
ca, = fy, and hence, 


ca= ca, + ca3 = fy + Caz. 


Multiplying both sides on the right by f, and noting that f4f; = f4 and 
£3f, = {0}, we obtain caf, = f4. Since J is a two-sided ideal, we must have 
f4 € J. Since ff) = fo and f4f> = f3, all the basis vectors are in J. Hence, 
J = &. The case where a3 4 0 leads to the same conclusion. Therefore, $ 
has no proper ideal, i.e., S is simple. 


An immediate consequence of Definition 3.2.12 and Theorem 3.2.4 is 


Proposition 3.2.14 A nontrivial homomorphism of a simple algebra A with 
any other algebra B is necessarily injective. 


Proof For any ¢: A — 8, the kernel of ¢ is an ideal of A. Since A has no 
proper ideal, kerd = A or ker = {0}. If ¢ is nontrivial, then ker = {0}, 
i.e., @ is injective. 


3.2.1. Factor Algebras 


Let A be an algebra and B a subspace of A. Section 2.1.2 showed how to 
construct the factor space A/B. Can this space be turned into an algebra? 
Let [a] and [[a’] be in A/B. Then the natural product rule for making A/B 
an algebra is 


[a] [a’] = [aa’]. (3.10) 
Under what conditions does this multiplication make sense? Since [a] = 


{a+ b] and [a’] = [a’ + b’] for all b, b’ € B, for (3.10) to make sense, we 
must have 


(a+ b)(a’ +b’) =aa’ +b” 


for some b” in B. Taking a= 0 =a’ yields bb’ = b’. This means that B 
must be a subalgebra of A. Taking a’ = 0 yields ab’ + bb’ = b” for all 
ac A, b,b’ € B and some b” € B. This means that ® must be a left ideal 
of A. Similarly, by setting a = 0 we conclude that B must be a right ideal of 
A. We thus have 


Proposition 3.2.15 Let A be an algebra and B a subspace of A. Then 
the factor space A/B can be turned into an algebra with multiplication 
[a] [a’] = [aa’], if and only if B is an ideal in A. The algebra so constructed 
is called the factor algebra of A with respect to the ideal B. 


factor algebra 
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Example 3.2.16 Let A and B be algebras and ¢ : A — 8B an algebra ho- 
momorphism. Example 3.1.20 and Theorem 3.2.4 showed that ¢(A) is a 
subalgebra of B and ker@ is an ideal in A. Now consider the linear map 
: A/kero — $(A) defined in Example 2.3.22 by ¢([a]) = (a). It is 
straightforward to show that @ is an algebra homomorphism. Using this and 
Example 2.3.22 where it was shown that @ is a linear isomorphism, we 
conclude that @ is an algebra isomorphism. 


3.3. Total Matrix Algebra 


Consider the vector space of n x n matrices with its standard basis {e;;}? jer 
where e;; has a 1 at the 7jth position and zero everywhere else. This means 
that (€i; ik = 5415 jks and 


n 
(ei; Cx mn = Y-(ei)mr (€x1)rn 


r= 


n 
= 2 5im9 jr kr OIn = dim jk5In = 6 jk (Ci) mn 
r=1 


or 


jj, &ki = 5 Keil. 


The structure constants are Cri = = 5im4jk5In. Note that one needs a double 


index to label these constants. 

The abstract algebra whose basis is {e;;}” = with multiplication rules 
and structure constants given above is called the total matrix algebra. Let 
F denote either R or C. Then the total matrix algebra over F is denoted by 
F @M, or M,(F). It is an associative algebra isomorphic with the real or 
complex matrix algebra, but its elements are not necessarily n x n matrices. 
When the dimension of the matrices is not specified, one writes simply F @ 
M or M(F). 

We now construct a left ideal of this algebra. Take e,, and multiply it on 
the left by pe ajj€;;, a general element of M,,(F). This yields 


o ats) = = 3 Qi jeijepg = 3 75 jp@iq = Deine 


LJj=! i,j=! i,j=l 


which corresponds to a matrix all of whose columns are zero except the qth 
column. Let £ be the set of all such matrices. Multiplying an element of £ 
by a general matrix }7),,-1 Bim€im, we obtain® 


n n n n 
> Ansa) (> nei) = > Bim Vi€imeig = » Bim Vidmi eq 


J,m=1 i=l i,l,m=1 i,l,m=1 


©The index p has no significance in the final answer because all the €pq with varying p 
but a fixed g generate the same matrices. 
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n n n 
= S BimYm€lg = > (>: Bim re) Clg 


I,m=1 l=1 \m=1 
XQ 


= 
n 
oe 
l=1 


It follows that £ is a left ideal. Furthermore, the very construction of £ 
implies that it satisfies condition (b) of Theorem 3.2.6. Had we multiplied 
€p, on the right, we would have obtained a right ideal consisting of matrices 
all of whose rows equaled zero except the pth row; and this right ideal would 
satisfy condition (b) of Theorem 3.2.6 for right minimal ideals. We thus have 


Theorem 3.3.1 The minimal left (right) ideals of R® M or C@ M 
consist of matrices with all their columns (rows) zero except one. 


Multiplying e,, on the left and the right by a pair of arbitrary matrices, 
the reader can easily show that one recovers the entire total matrix algebra. 
This indicates that the algebra has no proper two-sided ideal. Example 3.3.3 
below finds the center of M,(F) to be Span{1,,}, where 1, is the identity of 
M, (F). We thus have 


Theorem 3.3.2 The total matrix algebra M,,(F) is central simple. 


Example 3.3.3 Let a= )0; j=1 %jeij be in the center of F ® M,. Then 


ae] = ss Aerie = s aj jkeil = Yau 


ij=!1 i,j=l 


exja = > Cia jij = y aj jd iekj = Daves 


i,j=l ij=1 


For these two expressions to be equal, we must have 


n 
Y-(aixeii — ae) =0 
i=l 
By letting / = k in the sum above and invoking the linear independence 
of ej;, we conclude that a;, = 0 if i 4k. Therefore, a must be a diagonal 
matrix. Write a= )-y_, Axexx and let b = pee Bijeij be an arbitrary 
element of F @ M,,. Then 


n n 


n 
ab= > AcBijereij= D> AnBijdixers = D> AiBijeij 


i,j,k=1 i,j,k=1 ij=1 


finding the center of 
FQMz 
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n n 


n 
ba= S> AcBijeijece= D> AnBijdjceixn = D> Aj Bieij- 


i, jk=l i,j,k=1 i,j=l 


Again, because of the linear independence of e;;, for these two expressions 
to be equal, we must have A ; 6;; = 4; Bi; for alli and j and all 6;;. The only 
way this can happen is for A; to be equal to 4; for all i and /. It follows that 
a= 1,, where 1, = aan exx is the identity element of M,,(F). Therefore, 
M,(F) is central. 


3.4 _ Derivation of an Algebra 


The last two items in Example 3.1.9 have a feature that turns out to be of 
great significance in all algebras, the product rule for differentiation. 


Definition 3.4.1 A vector space endomorphism D: A —> A is called a 
derivation on A if it has the additional property 


D(ab) = [D(a) |b + a[D(b)]. 


Example 3.4.2 Let C’ (a, b) be as in Example 3.1.9, and let D be ordinary 
differentiation: D: f +> f’ where f’ is the derivative of f. Then ordinary 
differentiation rules show that D is a derivation of the algebra C’ (a, b). 


Example 3.4.3 Consider the algebra of n x n matrices with multiplication 
as defined in Eq. (3.3). Let A be a fixed matrix, and define the linear trans- 
formation 


Da(B) =AeB. 


Then we note that 


Da(BeC)=Ae (BeC)=A(BeC) —(BeC)A 
= A(BC — CB) — (BC — CB)A 
= ABC — ACB — BCA+ CBA. 
On the other hand, 
(DaB) eC + Be (DaC) = (Ae B) e C+ Be (AeC) 
= (AB — BA) eC + Be (AC — CA) 
= (AB — BA)C — C(AB — BA) + B(AC — CA) 
— (AC — CA)B 


= ABC + CBA — BCA — ACB. 


So, Da is a derivation on A. 


3.4 Derivation of an Algebra 81 


Theorem 3.4.4 Let {e;} a , be a basis of the algebra A. Then a vector space 
endomorphism D: A > A is a derivation on A iff 


D(eje;) = D(e;) - e; +e -D(e;) fori, 7 =1,2,...,N. 


Proof The simple proof is left as an exercise for the reader. 


If A has an identity e, then D(e) = 0, because 
D(e) = D(ee) = D(e)e + eD(e) = 2D(e). 


This shows that e € ker D. In general, one can show that ker D is a subalgebra 


of A. 
Proposition 3.4.5 Every derivation D satisfies the Leibniz formula Leibniz formula 
n 
n 
D" (ab) = D‘(a) -D"-*(b). 3.11 
(ab) z=) (a) -D"*(b) (3.11) 


Proof The proof by mathematical induction is very similar to the proof of 
the binomial theorem of Example 1.5.2. The details are left as an exercise 
for the reader. 


Derivations of A, being endomorphisms of the vector space A, are sub- 
sets of End(A). If D; and Dy» are derivations, then it is straightforward to 
show that any linear combination wD; + £Dz2 is also a derivation. Thus, the 
set of derivations D(A) on an algebra A forms a vector space [a subspace 
of End(A)]. Do they form a subalgebra of End(A)? Is D; D2 a derivation? 
Let’s find out! 


D, D>(ab) = Dj ([D>(a) |b + a[D2(b)]) 
= [D) D2 (a) ]b + D2(a)Dj (b) + Dj (a)D2(b) + a[D; D2 (b)]. 


So, the product of two derivations is not a derivation, because of the two 
terms in the middle. However, since these terms are symmetric in their sub- 
scripts, we can subtract them away by taking the difference D};D2 — D2D,. 
The question is whether the result will be a derivation. Switching the order 
of the subscripts, we obtain 


DD; (ab) = [D2D) (a) ]b + Dj (a)D2(b) + D2(a)D) (b) + a[D2Dj (b)]. 
Subtracting this from the previous expression yields 
(D| Dz — D2D;)(ab) 
= [D1 D2(a) |b + a[DD2(b)| — [D2Dj (a) |b — al D2D; (b)] 
= [(D, D2 — D2D;)(a)|b + a[ (DD — D2D,)(b)]. 
Thus, if we define a new product 
D; eD, = D,D, — D2D,, (3.12) 


then D(A) becomes an algebra. 
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Theorem 3.4.6 The set D(A) of derivations of A forms an algebra, the 
derivation algebra of A under the product (3.12). 


Definition 3.4.7 Let A and 8 be algebras, and ¢ : A > B a homomor- 
phism. Then D: A — 8 is called a ¢-derivation if 


D(aja2) = D(a; )¢(a2) + (a1)D(az), ay, an EA. 


Example 3.4.8 As an example, let D4 be a derivation in A. Then D=¢0 
D, is a d-derivation, because 


¢ 0 Da (ajay) = 6[D4 (aj )az + a1Da (ay) | 
= ¢[Da(ai)]¢ (a2) + ¢(a1)¢[Da (a2) ] 
= ¢ o Da(ay)o (a2) + $(a1)¢ 0 Da (az). 


Similarly, if Dg is a derivation in B, then Dg o ¢ is a d-derivation. 

Mote specifically, let A be the algebra C’ (a, b) of r-time differentiable 
functions, and B be the algebra R of real numbers. Let ¢, : C’ (a,b) > R 
be the evaluation at a fixed point c € (a,b), so that é-(f) = f(c). If D.: 
C’ (a, b) > R is defined as D.(f) = f’(c), then one can readily show that 
D,. is a ¢-derivation. 


Definition 3.4.9 Let A be an algebra with identity and w an involution 
of A. A linear transformation Q € £(A) is called an antiderivation of A 
with respect to @ if 


Q(ajaz) = Q(aj) - az + w(ay) - Q(az). 
In particular, a derivation is an antiderivation with respect to to the identity. 


As in the case of the derivation, one can show that kerQ is a subalgebra 
of A, Q(e) = 0 if A has an identity e, and Q is determined entirely by its 
action on the generators of A. 


Theorem 3.4.10 Let Q, and Q2 be antiderivations with respect to two in- 
volutions @, and w2. Suppose that @, 0 w2 = @2 0 w,. Furthermore assume 
that 


@122=+Q20@,; and a2Q) =+Q a2. 
Then Q)Q2  Q2Q, is an antiderivation with respect to the involution 


@1| 0M). 


Proof The proof consists of evaluating Q;Q2 = Q2Q) using Definition 3.4.9 
for Q; and Q2. We leave the straightforward proof for the reader. 


Some particular cases of this theorem are of interest: 


e Let Q be an antiderivation with respect to w and D a derivation such 
that wD = Dw. Then DQ — QD is an antiderivation with respect to w. 


3.5 Decomposition of Algebras 


e Let Q, and Q, be antiderivations with respect to the same involution w 
such that #@Q; = —Q;q fori = 1, 2. Then Q)Q2 +Q2Q, is a derivation. 

e A particular example of the second case is when Q is an antiderivation 
with respect to an involution w such that a2 = —Qo. Then ? is a 
derivation. 


3.5 Decomposition of Algebras 


In Sect. 2.1.3, we decomposed a vector space into smaller vector spaces. 
The decomposition of algebras into “smaller” algebras is also useful. In this 
section we investigate properties and conditions which allow such a decom- 
position. All algebras in this section are assumed to be associative. 


Definition 3.5.1 A nonzero element a € A is called nilpotent if a‘ = 0 
for some positive integer k. The smallest such integer is called the index 
of a. A subalgebra B of A is called nil if all elements of B are nilpotent. 
B is called nilpotent of index v if BY” = {0} and B’—! + {0}.’ A nonzero 
element P € JA is called idempotent if P* = P. 


Proposition 3.5.2 The identity element is the only idempotent in a division 
algebra. 


Proof The proof is trivial. 


If P is an idempotent, then P* = P for any positive integer k. Therefore, 
a nilpotent subalgebra cannot contain an idempotent. 

The following theorem, whose rather technical proof can be found in 
[Bly 90, p. 191], is very useful: 


Theorem 3.5.3 A nil ideal is nilpotent. 


Example 3.5.4 The set of n x n upper triangular matrices is a subalgebra 
of the algebra of n x n matrices, because the product of two upper triangular 
matrices is an upper triangular matrix, as can be easily verified. 

A strictly upper triangular matrix is nilpotent. Let’s illustrate this for a 
4 x 4 matrix. With 


O a2 413° a14 
oa O O az3  ar4 
0 O O ara 
0 O 0) 0 


7Recall that B* is the collection of products a; ...ax of elements in B. 
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it is easily seen that 


O O aj2423 a{2a24 + 413434 
Az — 0 0 0 473434 
0 0 0 0 : 
0 0 0 0 
0 0 0 aj2a73034 
0 0 0 0 
3 
ea 0 0 0 0 : 
0 0 0 0 
and 
0 0 0 0 
0 0 0 0 
4_ 
ae 0 0 0 0 
0 0 0 0 


Thus, the strictly upper triangular 4 x 4 matrices are nilpotent of index 4. In 
fact, one can show that the subalgebra of the strictly upper triangular 4 x 4 
matrices has index 4. 

The reader can convince him/herself that strictly upper triangular n x n 
matrices are nilpotent of index n, and that the subalgebra of the strictly upper 
triangular n x n matrices is nilpotent of index n. 


3.5.1 The Radical 


Nilpotent subalgebras play a fundamental role in the classification of alge- 
bras. It is remarkable that all the left, right, and two-sided nilpotent ideals of 
an algebra are contained is a single nilpotent ideal, which we shall explore 
now. 


Lemma 3.5.5 Let £ and M be two nilpotent left (right) ideals of the alge- 
bra A. Let X and be the indices of £ and M, respectively. Then £ + M is 
a left (right) ideal of A of index at most. + uw —1. 


Proof We prove the Lemma for left ideals. Clearly, £ + M is a left ideal. 
Any element of £ + M raised to the kth power can be written as a linear 
combination of elements of the form aja2...az with a; belonging to either 
£ or M. Suppose that / terms of this product are in £ and m terms in M. 
Let j be the largest integer such that a; € £. Starting with a; move to the 
left until you reach another element of £, say a,. All the terms a,+; to aj— 
are in M. Since L is a left ideal, 


/ 
a41-.-aj-1aj =a; EL. 
_e 5 


eA 
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This contracts the product a,a,4)...aj—jaj; to a,ai with both factors in L. 
Continuing this process, we obtain , 


ajao...a;=bjbo...bjc, bie L, ceM. 


Similarly, 
aja...a;=C/Q...¢,b, bel, ¢ eM. 


Since k =/]+m, if k = w+A-—1, then (u—m)+(A—/) = 1. This shows that 
if m < pw, then] > A and if / <4, then m > w. In either case, aj ...a, = 0, 
by one of the last two equations above. Hence, £ + M is nilpotent with an 
index of at most x + A — 1. The proof for the right ideals is identical to this 
proof. 


Lemma 3.5.6 Let £ be a nilpotent left ideal of the algebra A. Then the sum 
J=£4+42Z£4A is a nilpotent two-sided ideal. 


Proof Since £ is a left ideal, AL C L. Therefore, 
AIJ=AL+ALACL+L£LA=J, 
showing that J is a left ideal. On the other hand, 
JA=LA+ LAA CLA+L£LA=LACI 


showing that J is a right ideal. 
Now consider a product of k elements of LA: 


lhajhas...Ieay =HI5K... Mag, I,eL,a;eA 
273 k J r | 


where I = aj_l; € £. This shows that if k is equal to the index of £, then 
the product is zero and hence, LA is nilpotent. Note that since some of the 
a’s may be in £, the index of LA is at most equal to the index of £. Invoking 
Lemma 3.5.5 completes the proof. 


The preceding two lemmas were introduced for the following: 


Theorem 3.5.7 There exists a unique nilpotent ideal in A which contains 
every nilpotent left, right, and two-sided ideal of A. 


Proof Let N be a nilpotent ideal of maximum dimension. Let M be any 
nilpotent ideal. By Lemma 3.5.5, N+ M is both a left and a right nilpotent 
ideal, hence, a nilpotent ideal. By assumption N + M CN, and therefore, 
M CN, proving that N contains all ideals. If there were another maximal 
ideal N’, then N’ C 'N and N CN’, implying that NN’ = N, and that N is 
unique. 

If £ is a left nilpotent ideal, then by Lemma 3.5.6,£ CI=L+LACN, 
because J is an ideal. Thus, NN contains all the nilpotent left ideals. Similarly, 
N contains all the nilpotent right ideals. 
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Definition 3.5.8 The unique maximal ideal of an algebra A guar- 
anteed by Theorem 3.5.7 is called the radical of A and denoted by 
Rad(A). 


We have seen that a nilpotent algebra cannot contain an idempotent. In 
fact, the reverse implication is also true. To show that, we need the following 


Lemma 3.5.9 Suppose that A contains an element a such that Aak = 
Aa‘! for some positive integer k. Then A contains an idempotent. 


Proof Let 8 = Aa‘—!. Then 8 is a left ideal of A satisfying Ba = B. Mul- 
tiplying both sides by a, we see that 


and Bak = B. But ak ¢ B because B = Aa‘—!. Thus, with b = a‘, we get 
Bb = B. This means that there must exist an element P € B such that Pb = 
b, or (P2 — P)b=0. By Problem 3.32, P2 = P. Hence, 8B, and therefore A 
has an idempotent. 


Proposition 3.5.10 An algebra is nilpotent if and only if it contains no 
idempotent. 


Proof The “only if” part was shown after Definition 3.5.1. We now show 
that if A has no idempotent, then it must be nilpotent. To begin, we note 
that in general, Aa C A, and therefore, Aat C Aak—! for all k. If A has no 
idempotent, then the equality is ruled out by Lemma 3.5.9. Hence, Aa‘ c 
Aa‘—!. This being true for all k, we have 


AS Mas da SS Aa as 
Since A has a finite dimension, there must exist an integer r such that Aa’ = 


{0} for all a € A. In particular, a”t! = 0 for all a € A. This shows that A is 
nil, and by Theorem 3.5.3, nilpotent. 


Let P be an idempotent of A. Consider £(P), the left annihilator of P (see 
Example 3.2.2), and note that (a — aP) € £(P) for any a € A. Furthermore, 
if ae PL(P), then a= Px for some x € £(P). Thus, a has the property that 
Pa =a and aP=0. 

Similarly, consider R(P), the right annihilator of P, and note that (a — 
Pa) € ®(P) for any a € A. Furthermore, if a € R(P)P, then a = xP for some 
x € R(P). Thus, a has the property that aP = a and Pa= 0. 

Let J(P) = £(P) 1 R(P). Then, clearly J(P) is a two-sided ideal consist- 
ing of elements a € A such that aP = Pa = 0. To these, we add the subalge- 
bra PAP, whose elements a can be shown to have the property Pa = aP = a. 


3.5 Decomposition of Algebras 


We thus have 
PAP = {ac A| Pa=aP =a}, 
PL(P) = {ac A| Pa=a, aP=0}, 
R(P)P = {ac A| aP=a, Pa=0}, 
J(P) = {ac A| aP= Pa=0}, 


(3.13) 


and the following 


Theorem 3.5.11 Let A be any algebra with an idempotent P. Then we have 
the Peirce decomposition of A: 


A = PAP Gy PL(P) By R(P)P Sy J(P), 


where ®y indicates a vector space direct sum, and each factor is a subal- 
gebra. 


Proof By Eq. (3.13), each summand is actually an algebra. Furthermore, it 
is not hard to show that the only vector common to any two of the summands 
is the zero vector. Thus the sum is indeed a direct sum of subspaces. Next 
note that for any ae A, 


a = PaP + P (a — aP) + (a — Pa)P + (a— Pa — aP + PaP) 
—S SS ——<$<$<$<— ——$—$—_—_<— 
eL(P) ER(P) €J(P) 


Problem 3.33 provides the details of the proof. 


Definition 3.5.12 An element a € A is orthogonal to an idempotent P if 
aP = Pa = 0. Thus J(P) houses such elements. An idempotent P is called 
principal if J(P) contains no idempotent. 


Let Po be an idempotent. If it is not principal, then J(Po) contains an 
idempotent q. Let P} = Pp + q. Then using the fact that Ppq = qPo = 0, we 
can show that P; is an idempotent and that 


P;Pp9 =PoP; =Po and P,;q=qP; =q. (3.14) 


If x € J(P;), then xP; = P,x = 0, and the first equation in (3.14) gives 
xP, = Pox = 0, ie., x € J(Po), demonstrating that J(P,) C J(Po). Since 
q € J(Po), but q ¢ J(P;), J(P;) is a proper subset of J(Po). If J(P;) is not 
principal, then J(P;) contains an idempotent r. Let P2 = P; +r. Then P2 is 
an idempotent and, as before, J(P2) is a proper subset of J(P;). We continue 
this process and obtain 


J(Po) D I(P1) D I(P2) D--- DI(Py) D---. 


However, we cannot continue this chain indefinitely, because J(P) has finite 
dimension. This means that there is a positive integer n such that J(P,,) has 
no idempotent, i.e., P,, is principal. We have just proved 
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Proposition 3.5.13 Every algebra that is not nilpotent has a principal 
idempotent. 


Definition 3.5.14 An idempotent is primitive if it is not the sum of two 
orthogonal idempotents. 


Proposition 3.5.15 P is primitive if and only if it is the only idempotent of 
PAP. 


Proof Suppose that P is not primitive. Then there are orthogonal idempo- 
tents P; and P2 such that P = P; + P2. It is easy to show that PP; = P;P = P; 
for i = 1,2. Hence, by the first equation in (3.13), P; € PAP, and P is not 
the only idempotent of PAP. 

Conversely, suppose that P is not the only idempotent in PAP, so that 
PAP contains another idempotent, say P’. Then by the first equation in 
(3.13), PP’ = P’P = P’. This shows that 


(P—P’)P’=P’(P—P’)=0 and (P—P’)P=P(P—P’)=P-—P’, 


i.e., that (P — P’) € PAP and it is orthogonal to P’. Furthermore, P = 
(P — P’) + P’, i.e., P is the sum of two primitive idempotents, and thus not 
primitive. 


Let P be an idempotent that is not primitive. Write P = P; + Q, with P; 
and Q orthogonal. If either of the two, say Q, is not primitive, write it as 
Q=P) + P3, with Pz and P3 orthogonal. By Problem 3.34, the set {P)}3_, 
are mutually orthogonal idempotents and P = P; + P2+P3. We can continue 
this process until all P;s are primitive. Therefore, we have 


Theorem 3.5.16 Every idempotent of an algebra A can be expressed as the 
sum of a finite number of mutually orthogonal primitive idempotents. 


3.5.2 Semi-simple Algebras 


Algebras which have no nilpotent ideals play an important role in the clas- 
sification of algebras. 


Definition 3.5.17 An algebra whose radical is zero is called semi- 
simple. 


Since Rad(A) contains all nilpotent left, right, and two-sided ideals of an 
algebra, if A is semi-simple, it can have no nilpotent left, right, or two-sided 
ideals. 


Proposition 3.5.18 A simple algebra is semi-simple. 


3.5 Decomposition of Algebras 


Proof Vf the simple algebra A is not semi-simple, then it has a nilpotent 
ideal. Since the only ideal is A itself, we must show that A is not nilpo- 
tent. Assume otherwise, and note that A? is a proper ideal of A, because if 
A? = A, then A‘ = A for any k. This contradicts our assumption that A is 
nilpotent. Since the only ideals of A are A and {0}, we must have A” = {0}. 
It then follows that any proper subspace of A is trivially a nonzero proper 
ideal of A, which cannot happen because of the simplicity of A. 


Lemma 3.5.19 [fA is semi-simple and P is any principal idempotent in A, 
then A = PAP. 


Proof Since A is not nilpotent, it has a principal idempotent P by Proposi- 
tion 3.5.13. Since P is principal, J(P) of Theorem 3.5.11 contains no idem- 
potent and by Proposition 3.5.10 must be nilpotent. Since A has no nilpotent 
ideal, J(P) = {0}. Now note that R(P)L(P) of Theorem 3.5.11 consists of all 
elements annihilated by both the right and left multiplication by P. There- 
fore, R(P)L(P) is a subset of J(P). Hence, R(P)L(P) = {0}. This shows that 
if re R(P) and I € L£(P), then rl = 0. On the other hand, for any Ie £(P) 
and r € R(P), we have 


(Ir)? =1 (rl) r=0. 
eed 
=0 
It follows that the ideal £(P)R(P) (see Problem 3.10) is nil of index 2, 
and by Theorem 3.5.3, it is nilpotent. The semi-simplicity of A implies that 
L£(P)R(P) = {0}. Multiplying the Peirce decomposition on the left by £(P), 
and using these results and the fact that £(P)P = {0}, we obtain 


L£(P)A = £(P)R(P)P = {0}. 


In particular £(P)£(P) = {0}, and thus £(P) is nilpotent, hence zero. Sim- 
ilarly, R(P) is also zero. Therefore, the Peirce decomposition of A reduces 
to the first term. 


Theorem 3.5.20 A semi-simple algebra A is necessarily unital. Further- 
more, the unit is the only principal idempotent of A. 


Proof Let P be a principal idempotent of A. If b € A, then by Lemma 3.5.19 
b € PAP, and b = PaP for some a € A. Therefore, 


Pb = P’aP = PaP =b 


bP = PaP? = PaP =b. 


Since this holds for all b € A, we conclude that P is the identity of A. 


Idempotents preserve the semi-simplicity of algebras in the following 
sense: 


Proposition 3.5.21 [f A is semi-simple, then PAP is also semi-simple for 
any idempotent Peé A. 
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Proof Let N = Rad(PAP) and x € N c PAP. Construct the left ideal Ax in 
A and note that by Eq. (3.13), xP = Px = x. Then we have the following set 
identities: 


(Ax)?*! = AxAx...AxAx = AxPAPx...PAPxPAPx 
= Ax(PAPx)”. 


Since N is an ideal in PAP, we have PAPx CN, and if v is the index of 
N, then (PAPx)” = {0}. Thus, Ax is nilpotent. Since A is semi-simple, we 
must have Ax = {0}. Thus, for any nonzero a € A, ax = 0. In particular, 
Px = x = 0. Since x was an arbitrary element of Rad(PAP), we must have 
Rad(PAP) = {0}. Hence, PAP is semi-simple. 


Proposition 3.5.22 Let A be a semi-simple algebra and P an idem- 
potent in A. Then PAP is a division algebra if and only if P is primi- 
tive. 


Proof Suppose that PAP is a division algebra. By Proposition 3.5.2, identity 
is the only idempotent of PAP. But P is the identity of PAP. Hence, P is the 
only idempotent of PAP, and by Proposition 3.5.15 P is primitive. 
Conversely, assume that P is primitive. Let x ¢ PAP be nonzero. The 
left ideal £ = (PAP)x cannot be nilpotent because PAP is semi-simple by 
Proposition 3.5.21. Hence, it must contain an idempotent by Proposition 
3.5.13. But an idempotent in £ is an idempotent in PAP. Proposition 3.5.15 
identifies P as the sole idempotent in PAP, and thus, in £. As an element of 
£&, we can write P as P = ax with a € PAP. Since, P is the identity in PAP, 
x has an inverse. It follows that any element in PAP has an inverse. Thus it 
is a division algebra. 


It is intuitively obvious that a simple algebra is somehow more funda- 
mental than a semi-simple algebra. We have seen that a simple algebra 
is semi-simple. But the converse is of course not true. If simple algebras 
are more fundamental, then semi-simple algebras should be “built up” from 
simple ones. To see this we first need some preliminaries. 


Lemma 3.5.23 If A has an ideal B with unit 1p, then A=B @ J(18), 
where J(1g) is the ideal in the Peirce decomposition of A. 


Proof Since 1g is an idempotent® of A, we can write the following Peirce 
decomposition: 


A= 1pA1z Oy 1ph(1z) By R(1g)1z By J(1g8) = §(18) Gy J(1B) 


8Note that 1, is not the identity of A. It satisfies x1g = 13x =x only ifxe B. 


3.5 Decomposition of Algebras 


where 8(1z) = 1pA1p Oy 18£(1B) @v R(1B)18. Since, B is an ideal, 
each component of 8(1g) is a subset of B, and therefore, S(1g) C B. If 
b € 8, then b € A, and by the above decomposition, b = b; + bo, with 
b; € S(138) and bz € J(138). Multiplying both sides by 13, we get 


b1z,=b,;1g+b21g or b=b, 


because 1 is the identity in B and J(13,) is orthogonal to 13. It follows that 
b € S(13g) and, therefore, B C S(1g). Hence, B = 8(1g) and A= B @y 
J(1g). Since J(1g)B = BI(1z) = {0}, we can change @y to @. 


Lemma 3.5.24 A nonzero ideal of a semi-simple algebra is semi-simple. 


Proof Let A be a semi-simple algebra and B be a nonzero ideal of A. Then 
BRad(B)B Cc Rad(B) because Rad(8) is an ideal in B. Furthermore, since 
B is an ideal in A, AB C B and BA C B. It follow that A(B Rad(B)B)A = 
(AB) Rad(B)(BA) C BRad(B)B, i.e., that B Rad(B)B is an ideal in A. 
Furthermore, it is nilpotent because it is contained in Rad(B). Semi- 
simplicity of A implies that BRad(B)B = {0}. Since Rad(B) c B, 
A Rad(B)A CB, and A Rad(B)A Rad(B)A Rad(B)A C BRad(B)B. Now 
note that 


(ARad(B)A)* = ARad(B)AA Rad(B)AA Rad(B)A 
C ARad(B)A Rad(B)A Rad(B)A 
C BRad(B)B = {0}, 


indicating that A Rad(B)A is nilpotent. Since it is an ideal in A, and A 
is semi-simple, A Rad(B)A = {0}, and since A has an identity by Theo- 
rem 3.5.20, Rad(B) = {0}, and B is semi-simple. 


Theorem 3.5.25 An algebra is semi-simple iff it is the direct sum of 
simple algebras. 


Proof If the algebra A is the direct sum of simple algebras, then by Propo- 
sition 3.2.11, the only ideals of A are either direct sums of the components 
or contained in them. In either case, these ideals cannot be nilpotent because 
a simple algebra is semi-simple. Therefore, A is semi-simple. 

Conversely, assume that A is semi-simple. If it has no proper ideal, then 
it is simple and therefore semi-simple, and we are done. So, suppose B is 
a proper nonzero ideal of A. By Lemma 3.5.24 B is semi-simple, and by 
Theorem 3.5.20 B has a unit 1g. Invoking Lemma 3.5.23, we can write 
A=8B @J(1z). If either of the two components is not simple, we continue 
the process. 


Theorem 3.5.26 The reduction of a semi-simple algebra to simple subal- 
gebras is unique up to an ordering of the components. 
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Proof Let A= A; @---@A, with A; simple. The unit of A is a sum of 
the units of the components: 1 = 1; +---+1,. Let A=A) ®--- BAS be 
another reduction. Multiply both sides of the identity decomposition on the 
left by A’ to obtain 


F 
A SAM +0 +A SA te +A = DAY. 
i=l 


Since 1; € A;, and A; is an ideal of A, Aji Cc A;. Since A; are disjoint, Aji 
are disjoint. Since A’, is an ideal, A’,1; is an algebra as can be easily verified. 
Furthermore, since 1,1; = 0 fori #£k, the sum is a direct sum of algebras. 
Hence, by Proposition 3.2.11, A’ ji is an ideal, and since it is a subset of 
Aj, it is a subideal of A;. The simplicity of A; implies that A’ , =A; or 
Ai; = {0}. Since A’ is simple, only one of its components is nonzero, and 
it is one of the A;. 


3.5.3 Classification of Simple Algebras 


Theorems 3.5.25 and 3.5.26 classify all the semi-simple algebras, i.e., al- 
gebras with zero radicals, in terms of simple algebras. Can a general al- 
gebra be written as its radical and a semi-simple algebra? It turns out 
that an algebra A with nonzero radical Rad(A) is the direct sum A = 
Rad(A) @ (A/ Rad(A)), i.e., the radical plus the factor algebra modulo the 
radical. Since, in A/ Rad(A), the radical has been “factored out” of A, the 
quotient is indeed semi-simple. This result is known as Wedderburn prin- 
cipal structure theorem, and reduces the study of all algebras to that of 
simple algebras. Simple algebras can be further decomposed (for a proof, 
see [Benn 87, pp. 330-332)): 


Theorem 3.5.27 (Wedderburn decomposition) An algebra A is sim- 
ple if and only if 


A=D®@Mn=Mn(D), 


where D is a division algebra and M,(D) is a total matrix algebra 
over D for some non-negative integer n. D and M,(D) are unique up 
to a similarity transformation. 


Denote by Z, the center of M,. Since M,, is central, by Theorem 3.3.2, 
Zy = Span{1,,}. On the other hand, Eq. (3.8) gives 
Z(A) = Z(D) @ Zn = 2D), (3.15) 


which is a relation that determines D from a knowledge of the center of the 
algebra A. 


3.5 Decomposition of Algebras 


Proposition 3.5.28 The only division algebra over C is C itself. 


Proof Let D be a division algebra over C and x a nonzero element of D. 
Since D is finite-dimensional, there must exist a polynomial in x such that 
(why?) 


F(X) =x" + ay_1x" | +--+. Fayx+a01 =0. 


Let n be the smallest integer such that this holds. By the fundamental theo- 
rem of algebra (see Sect. 10.5), f(x) has at least one root A. Then we have 


ff = (k—A1)g(x) = 0. 


Now, g(x) has degree at most n — 1 and by assumption cannot be zero. 
Hence, it has an inverse because D is a division algebra. Therefore, x — 11 = 
0, and every element of D is a multiple of 1. This completes the proof. 


Proposition 3.5.28 and Theorem 3.5.27, plus the fact that M,,(C) is cen- 
tral (Theorem 3.3.2) give the following: 


Theorem 3.5.29 Any simple algebra A over C is isomorphic to 
Mn (C) for some n, and therefore A is necessarily central simple. 


The centrality of a complex algebra can also be deduced from Eq. (3.15) 
and Proposition 3.5.28. 

There is a theorem in abstract algebra, called the Frobenius Theorem, 
which states that the only division algebras over R are R, C, and H, and 
since the tensor product of two division algebras is a division algebra, also 
C ® H.? Furthermore, the center of C is the entire C, because it is a com- 
mutative algebra. On the other hand, H is central, i.e., its center is the span 
of its identity (reader, please verify), therefore, isomorphic to R. 

Now consider a simple algebra A over R. If A is central, i.e., if 2(A) = 
R, then Eq. (3.15) yields 


R=2ZD) => D=RorH. 
If Z(A) = C, then 
C=2ZD) > D=CorC@H. 


These results, plus the theorems of Frobenius and Wedderburn yield 


Since C is a subalgebra of H, the tensor product is actually redundant. However, in 
the classification of the Clifford algebras discussed later in the book, C is sometimes 
explicitly factored out. 


Frobenius Theorem 
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Theorem 3.5.30 Any simple algebra A over R is isomorphic to D ® 
M, for some n. If the center of A is isomorphic to C, then D is either 
CorC @H. If A is central (i.e., its center is isomorphic to R), then 
D is RorH. 


We conclude our discussion of the decomposition of an algebra by a fur- 
ther characterization of a simple algebra and the connection between primi- 
tive idempotents andminimal left ideals. 


Definition 3.5.31 Two idempotents P and P’ of an algebra A are called 
similar if there exists an invertible element s € A such that P’ = sPs~!. 


The proof of the following theorem can be found in [Benn 87, pp. 332- 
334]: 


Theorem 3.5.32 If P is an idempotent of a simple algebra A, then there 
exist mutually orthogonal primitive idempotents {P;};_, such that P = 
>=) Pi. The integer r is unique and is called the rank of P. Two idem- 
potents are similar if and only if they have the same rank. 


Theorem 3.5.33 Let P be a primitive idempotent of a semi-simple 
algebra A. Then AP (respectively PA) is a minimal left (respectively 
right) ideal of A. 


Proof Since a semi-simple algebra is a direct sum of simple algebras each 
independent of the others, without loss of generality, we can assume that 
A is simple. Suppose £ = AP is not minimal. Then £ contains a nonzero 
left ideal £1 of A. Since A is (semi-)simple, £1 is not nilpotent. Hence by 
Proposition 3.5.10 it contains an idempotent P,. If P; = P, then 


£L=AP=AP; C £, 


and therefore £ = £1, and we are done. So suppose that P; ¢ P. Then, by 
Theorem 3.5.16 


P} =Q,|+---+Q,, 


where Q; are all primitive and orthogonal to each other. Since Q; and P have 
rank 1, by Theorem 3.5.32 they are similar, i.e., there exists an invertible 
element s € A such that P = sQ;s~!. So, by choosing sP,s~! instead of P, 
if we have to,!° we can assume that Q; = P. Then 


P;} =P+Q)+---+Q,, 


10This is equivalent to replacing £ with sCs~!, which is allowed by Theorem 3.2.7 and 
the non-uniqueness clause of Theorem 3.5.27. 


3.6 Polynomial Algebra 


and P is orthogonal to all the Q;. Multiplying both sides on the left by P, we 
get PP; = P and 


L=AP=APP, CL), 


implying that £ = £,. The case of a right ideal follows similarly. 


3.6 Polynomial Algebra 


Let A be an associative algebra with identity 1. For any fixed element a € A, 
consider the set P[a] of elements of the algebra of the form 


[o,@) 
pla) => vaya‘, a EC, 
k=0 


in which only a finite number of the terms in the sum are nonzero. These are 
clearly polynomials in a for which addition and multiplication is defined as 
usual. 


Definition 3.6.1 Let A be an associative algebra with identity 1. For any leading coefficient, 
fixed element a € A, the set P[a] is a commutative algebra with identity monic, degree, 
called the polynomial algebra generated by a. The coefficient of the highest monomial 

power of a in p(a) = )°¢ aa‘ is called the leading coefficient of p, 

and apo is called the scalar term. A polynomial with leading coefficient 1 

is called monic. The highest power of a in p is called the degree of p 

and denoted by deg p. A nonzero polynomial of the form a,a” is called a 

monomial of degree n. 


It is clear that {a }¢2o is a basis of the polynomial algebra P[a]. 
If p(a) = eo axa* and g(a) = 772 Ba’, then 


CO 


(p +.4)(a) = (ox + Bada’, 


k=0 
[o,@) 
(pq)(a) = » yia', where yj = 2 ark Bj. 
i=0 j+k=i 
Consider two nonzero polynomials p(a) and g(a). Then obviously 


deg(p + gq) < max(deg p, degq), 
(3.16) 
deg(pq) = deg p + degq. 
Definition 3.6.2 The linear map d: P[a] > Pla] defined by differentiation map 
da‘ =kak!,  k>1 
da’ = d1=0 


is called the differentiation map in P[a]. 
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Theorem 3.6.3 The differentiation map d is a derivation of Pla]. We denote 
d(p) by p’. 


Proof The simple proof is left as Problem 3.35. 


Let p and q be two polynomials. Then d(pq) = d(p)q + pd(q), and in 
particular 


d(q*) = 2qd(q), 


and in general 
d(q‘) =kq*'d(q), k>=1 and d(q°)=0. 


Because q is an element of A, it can generate a polynomial in itself. We can 
construct, for example p(q), by replacing a with q: 


foe) 
p(q) = Yoong’. 
k=0 
Then, it is straightforward to show that (see Problem 3.36) 


d(p(q)) = p'(q) (3.17) 


This is the chain rule for the differentiation of polynomials. 


Definition 3.6.4 The polynomial d’(p) is called the rth derivative of p 
and denoted by p”). We extend the notation by defining p© = p. 


It is clear that p) = 0 if r > deg(p). 
Consider the monomial a”, and note that 
n! (n—r)! 


d’ (a”) = gtr or a'r = Eee, | (a”). 


(n—r)! n! 


Now use the binomial theorem to write 


a+py'=)0(")arpr= ne tar a”) - b’. 


r=0 


The left-hand side is an arbitrary term of the polynomial p(a + b). There- 
fore, taking linear combination of such terms, we have 


(r) 
pa+b)= yo 2 @ oy. (3.18) 


r=0 


This is called the Taylor formula for p. 

A root of the polynomial p(a) = )°p_9 nk a‘ of degree n is a scalar 4 € C 
such that p(A) = )"7_9 na’ = 0. The fundamental theorem of algebra!! 
states that C is algebraically closed, meaning that any polynomial with co- 
efficients in C can be factored out into a product of polynomials of degree 


"| 4 proof of the theorem can be found in Sect. 10.5. 
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one with coefficients in C: 
p(a) = mn(a— Ail)" ...(a—Asl)*, (3.19) 


where np #0, {A;}?_, are the distinct complex roots of the polynomial, k;, 
called the multiplicity of 4;, is a nonnegative integer, and )°}_, kj =n. 

As the simple example p(a) = a* + 1 suggests, R is not algebraically 
closed. Nevertheless, a real polynomial can still be factored out into prod- 
ucts of polynomials of degree | and 2 with real coefficients. To show this, 
first note that if A is a complex root of a real polynomial, then its complex 
conjugate A is also a root. This follows from taking the complex conjugate 
of pa nak = 0 and noting that n,; = nx for real nz. Furthermore, A and x 
must have the same multiplicity, otherwise the unmatched factors produce a 
polynomial with complex coefficients, which, when multiplied out with the 
rest of the factors, produce some complex coefficients for p(a). 

Next, multiply each factor in Eq. (3.19) containing a complex root by its 
complex conjugate. So, if Am = Ym + 1&m, then 


(a — Am 1)" (a — Am 1) = (a Yn — iEm1)*" (a — Yn + Em 1) 
kn 
= (a? — 2yma+ yal +&1) 
— 2 Kin 2 
=(a + oma + Bm1) > bn, < 4B. 


The inequality ensures that &,, 4 0, i.e., that the root is not real. We have 
just proved the following: 


Theorem 3.6.5 A real polynomial p(a) = Y~;—0 nga’ of degree n has the 
following factorization: 


i R 
. Kj 
p(a) =n [[@ =i)" I] (a? +ajat+ Bl)’, a; < 4B;, 
i=1 j=l 


where hj, aj}, Bj € R, k;, Kj¢€ N, A; are all distinct, the pairs (aj, Bj) are 
all distinct, and 2 ey Kj+ jah =n. 


Corollary 3.6.6 A real polynomial of odd degree has at least one real root. 


3.7 Problems 


3.1 Show that 
(a) the product on R? defined by 


(41, X2)(¥1, Y2) = (11 — X22, X1Y2 + x21) 


turns R? into an associative and commutative algebra, and 
(b) the cross product on IR? turns it into a nonassociative, noncommutative 
algebra. 
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3.2 Show that the center of an algebra is a subspace of that algebra. If the 
algebra is associative, then its center is a subalgebra. 


3.3 Prove that A”, the derived algebra of A, is indeed an algebra. 


3.4 Prove that the set A of n x n matrices, with the product defined by 
Eq. (3.3), form a nonassociative noncommutative algebra. 


3.5 Prove that the set A of n x n upper triangular matrices, with the prod- 
uct defined by ordinary multiplication of matrices is an associative non- 
commutative algebra. Show that the same set with multiplication defined by 
Eq. (3.3), is a nonassociative noncommutative algebra, and that the derived 
algebra A* = 8 is the set of strictly upper triangular matrices. What is the 
derived algebra B? of B? 


3.6 Prove Proposition 3.1.23. 

3.7 Let w € £(YV) be defined by w(a) = —a for all a € V. Is w an involution 
of V? Now suppose that V is an algebra. Is w so defined an involution of the 
algebra V? Recall that an involution of an algebra must be a homomorphism 


of that algebra. 


3.8 Show that no proper left (right) ideal of an algebra with identity can 
contain an element that has a left (right) inverse. 


3.9 Let A be an associative algebra, and x € A. Show that Ax is a left ideal, 
xA is a right ideal, and AxA is a two-sided ideal. 


3.10 Let £ be a left ideal and & a right ideal. Show that LR is a two-sided 
ideal. 


3.11 Show that & of Theorem 3.1.25 is an algebra isomorphism. 


3.12 Show that the linear transformation of Example 3.1.18 is an isomor- 
phism of the two algebras A and B. 


3.13 Let A be an algebra with identity 14 and ¢ an epimorphism of A onto 
another algebra B. Show that #(1,) is the identity of B. 


3.14 Show that the derived algebra of A is an ideal in A. 
3.15 Show that the algebra of quaternions is central. 


3.16 Write down all the structure constants for the algebra of quaternions. 
Show that this algebra is associative. 


3.17 Show that a quaternion is pure iff its square is a nonpositive real num- 
ber. 
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3.18 
(a) 
(b) 
(c) 
3.19 
3.20 


3.21 


3.23 


Let p and g be two quaternions. Show that 
(pq) =4"p*, 
q €R iff g* =q, and q € R? iff q* = —q, and 


qq* = q*q is a nonnegative real number. 

Prove Eq. (3.7). 

Show that ¢ of Example 3.2.16 is an algebra homomorphism. 
Prove Theorem 3.3.2. 


The algebra A has a basis {1, e} with e? = 1. 


Show that {f;, f2} with f) = 5(1+e) and f) = 5(1 —e) is also a basis. 
Show that A = £1 ®y La, where £; = Af;, i = 1,2 and ®y indicates 
a vector space direct sum. 

Show that £; an £2 are actually two-sided ideals and that £;£2 = {0}. 
Therefore, A = £; @ £2. 

Multiply an arbitrary element of £;, i = 1, 2, by an arbitrary element 
of A to show that £; = Spanff;}, i = 1,2. Thus, 4; = R,i = 1, 2, or 
A=ROR. 


If A is an algebra and D is a derivation in A, prove that both the center 


2(A) and the derived algebra A are stable under D, i.e., if a € Z(A) then 
D(a) € 2(A), and if a ¢ A” then D(a) € A”. 


3.24 


3.25 


3.26 


Let D: A > A be a derivation. Show that ker D is a subalgebra of A. 
Show that a linear combination of two derivations is a derivation. 


Fix a vector a € R? and define the linear transformation Dg : R* > R? 


by Da(b) = a x b. Show that Dg is a derivation of IR? with the cross product 
as multiplication. 


3.27 


Show that D defined on @’ (a, b) by D(f) = f’(c), where a < c <b, 


is a 6, -derivation if @- is defined as the evaluation map ¢-(f) = f(c). 


3.28 


Let Q € End(A) be an antiderivation of A with respect to w. Show that 


ker Q is a subalgebra of A and Q(e) = 0 if A has an identity. 


3.29 


Derive the Leibniz formula (3.11). 


3.30 Prove Theorem 3.4.10. 


3.31 


Show that the algebra of the strictly upper triangular n x n matrices is 


nilpotent of index n. 


3.32 Let b be a fixed element of an algebra B. Consider the /inear trans- 
formation T; : B — 8B given by T,(x) = xb. Using the dimension theorem, 
show that if Bb = 8, then kerT, = 0. 
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3.33 Let A be an algebra with an idempotent P. Show that PAP consists of 
elements a such that aP = Pa = a. For the subspaces of Theorem 3.5.11, let 
A, =PAP, Az = PL(P), Az = R(P)P, and Ay = I(P). Show that {A;}3_, 
are subalgebras of A and that A; 1. Aj; = {0}, but A; A; 4 {0} for all i ¥ j, 
i, 7 =1,...,4. Thus, Peirce decomposition is a vector space direct sum, but 
not an algebra direct sum. 


3.34 Let p and q be orthogonal idempotents. Suppose that q = q; + qp, 
where q, and qp are orthogonal idempotents. Show that qq; = q;q = q; 
for i = 1, 2. Using this result, show that pq; = q;p = 0 for i = 1, 2. 


3.35 Use the basis ie; of P[a] and apply Theorem 3.4.4 on it to show 
that the differentiation map of Definition 3.6.2 is a derivation. 


3.36 Derive the chain rule (3.17). 


Operator Algebra 


Chapter 3 introduced algebras, i.e., vector spaces in which one can multiply 
two vectors to obtain a third vector. In this chapter, we want to investigate 
the algebra of linear transformations. 


4.1 Algebra of End(V) 


The product in the algebra of the endomorphisms End(V) of a vector space 
V is defined as the composition of maps. In addition to the zero element, 
which is present in all algebras, End(V) has an identity element, 1, which 
satisfies the relation 1|a) = |a) for all |a) € V. Thus, End(V) is a unital 
algebra. With 1 in our possession, we can ask whether it is possible to find an 
operator T~! with the property that T~'T = TT! = 1. Generally speaking, 
only bijective mappings have inverses. Therefore, only automorphisms of a 
vector space are invertible. 


Example 4.1.1 Let the linear operator T : R? —> R? be defined by 
T(x1, X2,.%3) = (1 + X20, X2 + X3,X1 +3). 


We want to see whether T is invertible and, if so, find its inverse. T has an 
inverse if and only if it is bijective. By the comments after Theorem 2.3.13 
this is the case if and only if T is either surjective or injective. The lat- 
ter is equivalent to kerT = |0). But kerT is the set of all vectors satisfying 
T(x, X2, x3) = (0, 0, 0), or 


xy +x2=0, x2 + x3 =0, xy +%x3=0. 


The reader may check that the unique solution to these equations is xj = 
x2 = x3 = 0. Thus, the only vector belonging to kerT is the zero vector. 
Therefore, T has an inverse. 

To find T~! apply T7!T = 1 to (x4, x2, x3): 


(x1, x2, x3) =T!T(x1, x2, x3) =T (x1 + x2, x2 + 23,01 +3). 
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This equation demonstrates how T~! acts on vectors. To make this more 
apparent, we let xj +.x2 =x, x2 +%3 = y, x1 + x3 = z, Solve for x1, x2, and 
x3 in terms of x, y, and z, and substitute in the preceding equation to obtain 


1 
Ty D=Ze-ytuxty—z—xtytz), 


Rewriting this equation in terms of x1, x2, and x3 gives 
4 1 
T™' (x1, x2, X43) = ha! — XQ +.x3,.X1 + x2 — x3, —X1 + X2 +3). 
We can easily verify that T~!T = 1 and that TT~! = 1. 


Since End(V) is associative, Theorem 3.1.2 applies to it. Nevertheless, 
we restate it in the context of operators as a corollary in which we also 
include a generalization of Theorem 2.3.19: 


Corollary 4.1.2 The inverse of a linear operator is unique. If T and § are 
two invertible linear operators on V, then TS is also invertible and 


qs) '=s''r!. 


The following proposition, whose straightforward proof is left as an ex- 
ercise for the reader, turns out to be useful later on: 


Proposition 4.1.3. An endomorphism T € End(V) is invertible iff it sends a 
basis of V onto another basis of V. 


Let V; and V2 be vector spaces and £1(V 1) and £2(V2) the set of their 
endomorphisms. A natural definition of £(V; ® V2) is given by 


L(V} @ V2) = (£1 @ £2)(V1 ® V2) = £1 (V1) ® £2(V2). (4.1) 
In particular, if Vj = C, V2 = V is a real vector space, then 
LIC@V)=L(V°), (4.2) 


where V© is the complexification of V as given in Definition 2.4.8. It is 
important to note that 


L(C@V) FC L(V) 


because £(C) FC. 


4.1.1 Polynomials of Operators 


From Sect. 3.6, we know that we can construct polynomials of T € End(V) 
such as 


p(T) =ao1 + aT + aT fest ayT”. 
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We shall use these polynomials as starting points for constructing functions 
of operators. 


Example 4.1.4 Let Tg : R? — R? be the linear operator that rotates vectors 
in the x y-plane through the angle 0, that is, 


To (x, y) = (xcosé — ysind, x sin? + ycos@). 


We are interested in powers of Tg: 


/ y 


x y 
TT 
LG y= Ty (x cosé — ysind, x sin@ + ycos@) 


= (x’cos@ — y’sin@, x’ sin@ + y’ cos6) 

= ((x cos@ — ysin@)cos@ — (x sin@ + ycos@) siné, 
(x cos@ — ysin@)siné + (x sind + ycos@) cos @) 

= (x cos 206 — ysin20, x sin20 + ycos 26). 


Thus, T? rotates (x, y) by 20. Similarly, one can show that 
T} (x, y) = (x cos 30 — ysin30, x sin36 + ycos 38), 


and in general, Tj (x, y) = (x cosn@ — ysinn@, x sinn@ + ycosn@), which 
shows that T) is a rotation of (x, y) through the angle 190, that is, Tj = Tyo. 
This result could have been guessed because Tj is equivalent to rotating 
(x, y) n times, each time by an angle 0. 


Negative powers of an invertible linear operator T are defined by T™” = 
(T~!)”. The exponents of T satisfy the usual rules.! In particular, for any 
two integers m and n (positive or negative), TT” = T”*” and (T”)" =T™". 
The first relation implies that the inverse of T” is T~’’. One can further gen- 
eralize the exponent to include fractions and ultimately all real numbers; but 
we need to wait until Chap. 6, in which we discuss the spectral decomposi- 
tion theorem. 


Example 4.1.5 Let us evaluate T,” for the operator of the previous exam- 
ple. First, let us find T, ' (see Fig. 4.1). We are looking for an operator such 
that Ty 'Ta(x, y) = (x, y), or 


T, ‘(x cos@ — ysin@, x sin@ + ycos@) = (x, y). (4.3) 


We define x’ = x cos@ — ysin@ and y’ = x sin@ + ycos@ and solve x and 
y in terms of x’ and y’ to obtain x = x’cos@ + y’sin@ and y = —x’ sin@ + 
y’ cos 6. Substituting for x and y in Eq. (4.3) yields 


T, (x’, y’) = (x’ cos + y’ sind, —x’ sind + y’ cos6). 


'These rules apply to any associative algebra, not just to End(V). 
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T, (x,y) 


9 (x,y) 


Tg y) 
Fig. 4.1 The operator Tg and its inverse as they act on a point in the plane 


Comparing this with the action of Tg in the previous example, we discover 
that the only difference between the two operators is the sign of the sin@ 
term. We conclude that T, ' has the same effect as T_». So we have 


Ty Slee. and ESAT) Se) Spe: 


It is instructive to verify that T, "Tj = 1: 


fs 
x! y 


— 
T,"T3 (x, y) =T,"(x cosné — ysinnd, x sinné + y cosné) 


= (x’cosn6 + y’ sinn@, —x’ sinné + y’ cosné) 


= (@ cosn@ — ysinn@) cosné + (x sinn@ + ycosné) sinné, 
—(xcosné — ysinn@) sinné + (x sinné + ycosn@) cos n@) 


— (x (cos? no + sin? nd), y(sin? n@ + cos? n0)) = (x,y). 
Similarly, we can show that ThT) "(x,y)=(x, y). 


One has to keep in mind that p(T) is not, in general, invertible, even if T 
is. In fact, the sum of two invertible operators is not necessarily invertible. 
For example, although T and —T are invertible, their sum, the zero operator, 
is not. 


4.1.2 Functions of Operators 


We can go one step beyond polynomials of operators and, via Taylor expan- 
sion, define functions of them. Consider an ordinary function f(x), which 
has the Taylor expansion 


w(x — x0)k df 
fo de kM dx*|,_,. 
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in which xo is a point where f(x) and all its derivatives are defined. To this 
function, there corresponds a function of the operator T, defined as 


dk fl T= x01) 
f=), dx¥ |, (4.4) 


Because this series is an infinite sum of operators, difficulties may arise 
concerning its convergence. However, as will be shown in Chap. 6, f(T) 
is always defined for finite-dimensional vector spaces. In fact, it is always a 
polynomial in T (see also Problem 4.1). For the time being, we shall think of 
f(T) as a formal infinite series. A simplification results when the function 
can be expanded about x = 0. In this case we obtain 


[ee 


dk 
sma ot 
k=0 


T* 


i (4.5) 


A widely used function is the exponential, whose expansion is easily found 
to be 


lo) Tk 
eT =exp(1)=)> a (4.6) 
k=0 


Example 4.1.6 Let us evaluate exp(a#T) when T : R* — R? is given by 
T(x, y)=(-y, x). 


We can find a general formula for the action of T” on (x, y). Start with 
n=2: 


T(x, y) =T(—y, x) = (—x, -y) = -(@, y) =-1(x, y). 


Thus, T? = —1. From T and T* we can easily obtain higher powers of T. For 
example: P= T(T?) =—T, TT =T’T? =1, and in general, 


Tos(1)"1 forn=0:1, 2... 
T+! = (-1)"T_ forn=0,1,2,... 


ee) 


_ yo (aT)" (aT)" _  @T)! (wT)** 
nen 2, n a ni = arth (2k)! 


oo g2kt+l zeker i) 2k 72k 


Qk+b! * <= (2k)! 


_ ad ae oo (—1fo* 
=) (2k + 1)! f= (2k)! 
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a pc halal ; asd (—1)*ar2* 
kt! 4 2b! 


The two series are recognized as sina and cosa, respectively. Therefore, we 
get 


eT =Tsina + 1cosa, 


which shows that e®T is a polynomial (of first degree) in T. 
The action of e® on (x, y) is given by 


etl (x, y) = (sinaT + cosa1)(x, y) = sinaT(x, y) + cosa1(x, y) 
= (sina)(—y, x) + (cosa@)(x, y) 
= (—ysina, x sina) + (x cosa, ycosa) 
= (x cosa — ysina, x sina + ycosa@). 
The reader will recognize the final expression as a rotation in the xy-plane 


through an angle a. Thus, we can think of eT as a rotation operator of angle 
a about the z-axis. In this context T is called the generator of the rotation. 


generator of the rotation 


4.1.3 Commutators 


The result of multiplication of two operators depends on the order in which 
the operators appear. This means that if T, U € £(V), then TU € L(V) and 
UT © L(V); however, in general UT 4 TU. When this is the case, we say that 
U and T do not commute. The extent to which two operators fail to commute 
is given in the following definition. 


Definition 4.1.7 The commutator [U, T] of the two operators U and T in 


comimurarol denned £(YV) is another operator in £(V), defined as 


[U, T] =UT— TU. 
An immediate consequence of this definition is the following: 


Proposition 4.1.8 For S,T,U € L(V) and a, B € C (or R), we have 


[U, T] = —[T, U], antisymmetry 

[wU, BT] = af[U, T], linearity 

[S, T+ U] = [S, T] + [S, VU], linearity in the right entry 
[S+T, U] = [S, U] + [T, U], linearity in the left entry 
[ST, U] = S[T, U] + [S, UJT, right derivation property 
[S, TU] = [S, T]U + T[S, U], left derivation property 


[[S, T],U] +[[U,$],T] + [[T,U],$]=0, Jacobi identity 


4.2 Derivatives of Operators 


Proof In almost all cases the proof follows immediately from the definition. 
The only minor exceptions are the derivation properties. We prove the left 
derivation property: 


[S, TU] — S(TU) — (TU)S — STU — TUS + TSU — TSU 
— 
=0 
= (ST — TS)U + T(SU — US) = [S, T]U + T[S, U]. 


The right derivation property is proved in exactly the same way. 


A useful consequence of the definition and Proposition 4.1.8 is 


[A, A’’| =0 form=0,+1,+2..... 


In particular, [A, 1] = 0 and [A, A~!] =0. 
An example of the commutators of operators is that of D and T defined 
in Example 2.3.5. The reader is urged to verify that 


[D,T]=1 (4.7) 


4.2 Derivatives of Operators 


Up to this point we have been discussing the algebraic properties of op- 
erators, static objects that obey certain algebraic rules and fulfill the static 
needs of some applications. However, physical quantities are dynamic, and 
if we want operators to represent physical quantities, we must allow them to 
change with time. This dynamism is best illustrated in quantum mechanics, 
where physical observables are represented by operators. 

Let us consider a mapping H: R > End(V), which? takes in a real num- 
ber and gives out a linear operator on the vector space V. We denote the 
image of t € R by H(t), which acts on the underlying vector space V. 
The physical meaning of this is that as ¢ (usually time) varies, its image 
H(t) also varies. Therefore, for different values of t, we have different op- 
erators. In particular, [H(t), H(¢’)] 4 0 for t 4 t’. A concrete example is 
an operator that is a linear combination of the operators D and T intro- 
duced in Example 2.3.5, with time-dependent scalars. To be specific, let 
H(t) = Dcoswt + Tsinwt, where w is a constant. As time passes, H(t) 
changes its identity from D to T and back to D. Most of the time it has a 
hybrid identity! Since D and T do not commute [see Eq. (4.7)], values of 
H(t) for different times do not necessarily commute. 

Of particular interest are operators that can be written as exp(H(f)), 
where H(t) is a “simple” operator; i.e., the dependence of H(t) on ¢ is sim- 


Strictly speaking, the domain of H must be an interval [a,b] of the real line, because 
H may not be defined for all R. However, for our purposes, such a fine distinction is not 
necessary. 
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pler than the corresponding dependence of exp(H(t)). We have already en- 
countered such a situation in Example 4.1.6, where it was shown that the 
operation of rotation around the z-axis could be written as exp(aT), and the 
action of T on (x, y) was a great deal simpler than the corresponding action 
of exp(aT). 

Such a state of affairs is very common in physics. In fact, it can be shown 
that many operators of physical interest can be written as a product of sim- 
pler operators, each being of the form exp(@T). For example, we know from 
Euler’s theorem in mechanics that an arbitrary rotation in three dimensions 
can be written as a product of three simpler rotations, each being a rotation 
through a so-called Euler angle about an axis. 


Definition 4.2.1 For the mapping H:R— End(V), we define the deriva- 
tive as 

dH . H@¢+ At) —H() 

— = lim —————.. 

dt At>0 At 


This derivative also belongs to End(YV). 


As long as we keep track of the order, practically all the rules of differ- 
entiation apply to operators. For example, 
dU 
zr <(UT)= at ut 
dt 
We are not allowed to change the order of multiplication on the RHS, not 
even when both operators being multiplied are the same on the LHS. For 
instance, if we let U = T = H in the preceding equation, we obtain 


a . dH 
This is not, in general, equal to 2H7-. 


Example 4.2.2 Let us find the derivative of exp(tH), where H is indepen- 
dent of t. Using Definition 4.2.1, we have 


exp[(¢ + At)H] — exp(tH) 
At , 


d 

— tH) = li 

dt expe) var 
However, for infinitesimal At we have 


exp[(t + Ar)H] — exp(tH) = e/He4"H _ @tH 


= e'4(q + HAD) — c= eMHar. 


Therefore, 


d TAH AT 
© exp(tH) = lim © = oly 
dt At>0 At 


4.2 Derivatives of Operators 


Since H and e’4 commute,’ we also have 
d 
— exp(tH) = He. 
a exp(tH) e 


Note that in deriving the equation for the derivative of e’4, we have used 
the relation e’4e4‘H4 — e+49H_ This may seem trivial, but it will be shown 
later that in general, e5+T 4 eSeT. 


Now let us evaluate the derivative of a more general time-dependent op- 
erator, exp[H(r)]: 


_ exp[H(¢ + Ar)] — exp[H(r)] 
= lim 
At>0 At 


d 
it exp[H(r) | 


If H(t) possesses a derivative, we have, to the first order in Ar, 
dH 
H(t + At)=H(t) + At 


and we can write exp[H(t + Art)] = exp[H(t) + AtdH/dt]. It is very tempt- 
ing to factor out the exp[H(t)] and expand the remaining part. However, as 
we will see presently, this is not possible in general. As preparation, consider 
the following example, which concerns the integration of an operator. 


Example 4.2.3 The Schrodinger equation 


a) 
i |v) =H|y(0) 


can be turned into an operator differential equation as follows. Define the 
so-called evolution operator U(t) by |w(t)) = U(t)|w(0)), and substitute 
in the Schrédinger equation to obtain 


C) 
i UC) |W) = HU) |W). 


Ignoring the arbitrary vector |y(0)) results in dU/dt = —iHU(f), a differen- 
tial equation in U(t), where H is not dependent on ¢. We can find a solution 
to such an equation by repeated differentiation followed by Taylor series 
expansion. Thus, 


au dv uy? 
qa iH[—iHU(t)] = (—iH)“U(r), 
aud 


9 dU 
he 7 tC iW Uo] = (—iH)°—— = (—iH)*U(t). 


3This is a consequence of a more general result that if two operators commute, any pair 
of functions of those operators also commute (see Problem 4.14). 
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In general d”U/dt” = (—iH)"U(t). Assuming that U(r) is well-defined at 
t = 0, the above relations say that all derivatives of U(t) are also well- 
defined at t = 0. Therefore, we can expand U(f) around t = 0 to obtain 


St" (d"U 0 ner 
wo= (Ts )_ =bacm U(0) 


n=0 n=0 


= (>: ca" uo =e HY(0). 


n=0 


But U(O) = 1 by the definition of U(t). Hence, U(t) = eH and 
VO)=e "| yO). 


Let us see under what conditions we have exp(S + T) = exp(S) exp(T). 
We consider only the case where the commutator of the two operators com- 
mutes with both of them: 


[T, [S, T]] =0 = [S, [S, T]]. 


Now consider the operator U(t) = e’Se'TeS+ and differentiate it using 


the result of Example 4.2.2 and the product rule for differentiation: 
du 


oe = SefSe!T ot (StT) a eT e'T oe t(StD _ eSe'T (5 +4 Tet S+1) 


— SetSpT,-t(StT) _ ptS,tT5,-1(S+T)_ (4.8) 


The three factors of U(t) are present in all terms; however, they are not 

always next to one another. We can switch the operators if we introduce a 
commutator. For instance, e’™S = Se'T + [e'T, §]. 

It is left as a problem for the reader to show that if [S, T] commutes with 

S and T, then [e’T, $] = —1[S, T]e’", and therefore, e’™S = Se'™ — ¢[S, T]e’™. 

Substituting this in Eq. (4.8) and noting that eSS = Se’$ yields dU/dt = 

t[S, T]U(t). The solution to this equation is 
12 
U(t) = exp(5 


2 
t 
=1s.1)) > fete" —exp( F1s.11) 


because U(0) = 1. We thus have the following: 


Proposition 4.2.4 Let $,T « £(V). If [S, [S, T]] = 0 = [T, [S, T]], then the 
Baker-Campbell-Hausdorff formula holds: 


efS,iT Se /2)[8,T] _ gt(S+T)_ (4.9) 
In particular, e'Se'T = e! $+ if and only if [S, T] = 0. 


If t = 1, Eq. (4.9) reduces to 


eset oe (l/21S.T] _ pS+T (4.10) 


4.2 Derivatives of Operators 


Now assume that both H(t) and its derivative commute with [H, dH/dr]. 
Letting $ = H(t) and T = AtdH/dt in (4.10), we obtain 


ect Ar) = eh@+ArdH/dt 


— gH ,At(dH/dt) ,—[H(),AtdH/dr}/2_ 


For infinitesimal Az, this yields 


ehr+An _ -HO) [4 4 are ( Lae H(t), ab 
dt 2 dt 


dH 1 dH 
= eH} 44 Ar— — =At] H(t), — 
Te oe ae 


and we have 


dw tH aly 2H) 
dt dt 2 * at 


We can also write 
ehi+An _ Q[H()+AtdH/dt] _ ,[ArdH/dt+H@)] 


— el AtdH/dt] H(t) ,—[AtdH/dtH()]/2_ 


which yields 


dt dt 2 * at 


Adding the above two expressions and dividing by 2 yields the following 
symmetric expression for the derivative: 


dw) 1(dHy, ydH\ {dH y 
a Oa a lee 


d dH 1 dH 
—elO = gt 4 _¢H E | 


where {S, T} = ST + TS is called the anticommutator of the operators S$ 
and T. We, therefore, have the following proposition. 


Proposition 4.2.5 Let H:R— L(V) and assume that H and its derivative 
commute with [H, dH/dt]. Then 


d 

4 go {44 ul 
dt 2\ dt 

In particular, if [H, dH/dt] = 0, then 


d 
oH) dH oH _ oH dH 
dt dt dt 


A frequently encountered operator is F(t) = e'ABe~'A, where A and B 
are t-independent. It is straightforward to show that 


dF d dF 
7 IA FO] and rus Fn] =[A =|. 
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Using these results, we can write 
a-F od a 
dt. dt 

and in general, d"F/dt” = A" [F(t)], where A” [F(t)] is defined inductively 

as A” [F(t)] =[A, A”~![F(t)]], with A°[F(t)] = F(t). For example, 


A*[F(t)] =[A,A°[F(x)]] =[A, [A, A[F()]]] =[A. [A. [A. F@]]]. 


Evaluating F(t) and all its derivatives at tf = 0 and substituting in the 
Taylor expansion about t = 0, we get 


CO 

1" dF 

FO =) ae 
n=0 


A, F(t)] =[A, [A, F()]] = A’ [FO]. 


[ee 


=5 “a"| [F (0) Daa = 
t=0 =i 


That is, 


OO yn 

tAp ,-tA _ t n 

e*Be A= —A"(B] = B +110, B] + 
n=0 


Sometimes this is written symbolically as 


where the RHS is merely an abbreviation of the infinite sum in the middle. 
For t = | we obtain a widely used formula: 


[ee 


_ 1, 1 
eABeA = eA[B] = (> —A )iai- B+ [A,B] + 1 —[A, [A Bl] + 


If A commutes with [A, B], then the infinite series truncates at the second 
term, and we have 


e'ABe~*A — B + 1 [A,B]. 
For instance, if A and B are replaced by D and T of Example 2.3.5 and 
Eq. (4.7), we get 
Te T+ t(D, T] =T +71. 


The RHS shows that the operator T has been translated by an amount t 
(more precisely, by ¢ times the unit operator). We therefore call exp(tD) the 
translation operator of T by t, and we call D the generator of translation. 
With a little modification T and D become, respectively, the position and 
momentum operators in quantum mechanics. Thus, 


Box 4.2.6 Momentum is the generator of translation in quantum me- 
chanics. 


But more of this later! 


4.3 Conjugation of Operators 
4.3. Conjugation of Operators 


We have discussed the notion of the dual of a vector in conjunction with 
inner products. We now incorporate linear operators into this notion. Let 
|b), |c) € V and assume that |c) = T|b). We know that there are linear 
functionals in the dual space V* that are associated with (\b))* = (b| and 
(|c))* = (c|. Is there a linear operator belonging to £(V*) that somehow 
corresponds to T? In other words, can we find a linear operator that relates 
(b| and (c| just as T relates |b) and |c)? The answer comes in the following 
definition. 


Definition 4.3.1 Let T € £(V) and |a), |b) € V. The adjoint, or hermitian 
conjugate, of T is denoted by T’ and defined by* 


(a|T|b)* = (b|T*|a). (4.11) 


The LHS of Eq. (4.11) can be written as (a | c)* or (c | a), in which case 
we can identify 


(cl=(O|T => (T\))' = (b|T*. (4.12) 


This equation is sometimes used as the definition of the hermitian conjugate. 
From Eq. (4.11), the reader may easily verify that 1’ = 1. Thus, using the 
unit operator for T, (4.12) justifies Eq. (2.26). 

Some of the properties of conjugation are listed in the following theorem, 
whose proof is left as an exercise. 


Theorem 4.3.2 Let U, T € End(V) anda € C. Then 
lL (U+T'=U'S+T. 2. (UT) =T'U'. 
3. (aT)i =a*T". 4. ((1)')' =T. 


The last identity holds for finite-dimensional vector spaces; it does not 
apply to infinite-dimensional vector spaces in general. 

In previous examples dealing with linear operators T : R” — R”, an ele- 
ment of R” was denoted by a row vector, such as (x, y) for R? and (x, y, z) 
for R>. There was no confusion, because we were operating only in V. How- 
ever, since elements of both V and V* are required when discussing T, T*, 
and T", it is helpful to make a distinction between them. We therefore resort 
to the convention introduced in Example 2.2.3 by which 


Box 4.3.3 Kets are represented as column vectors and bras as row 
vectors. 


4With the notion of adjoint already introduced in Definition 2.4.3, we should probably 
not use the same name for T'. However, the adjoint as defined in Definition 2.4.3 is rarely 
used in physics. Furthermore, both the notation and the context will make it clear which 
adjoint one is talking about. Therefore, there is no risk of confusion. 
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Example 4.3.4 Let us find the hermitian conjugate of the operator T: 
C3 > C? given by 


ay a, —ia2 +03 
Tla|= 11 — a3 
a3 ay —a2 +103 
Introduce 
ay Bi 
|a)=| a2] and |b)=] Bo 
03 B3 


with dual vectors (a| = (a@f a} a3) and (b| = (BY BF 63), respectively. We 
use Eq. (4.11) to find T’: 


(b|T" |a) 
Bi\ |" 
=(a[T|b)* =| (oF “as as) T | Ba 
3 
By — iB2 + B3 
=| (aj o} a3)| if —Bs 
Bi — Bo + ips 


= [at By — iat Bo + af Bs + ia} By — a3 Bs + a By — w¥ By + i} B3]”. 
Taking the complex conjugate of all the numbers inside the square brackets, 
we find 


a1 —ia2 + 03 
(b|T'|a) = (6% BS BR)| ia — 23 
—$—$—$—$——S ee’ 


| — a2 — 103 


=(b| 
Therefore, we obtain 
ay a, —ia2 +03 
Ti a2} = ia, — a3 
03 a1 — 2 — 103 


4.3.1 Hermitian Operators 


The process of conjugation of linear operators looks much like conjugation 
of complex numbers. Equation (4.11) alludes to this fact, and Theorem 4.3.2 
provides further evidence. It is therefore natural to look for operators that 
are counterparts of real numbers. One can define complex conjugation for 
operators and thereby construct real operators. However, these real operators 
will not be interesting because—as it turns out—they completely ignore the 
complex character of the vector space. The following alternative definition 


4.3. Conjugation of Operators 


makes use of hermitian conjugation, and the result will have much wider 
application than is allowed by a mere complex conjugation. 


Definition 4.3.5 A linear operator H € £(V) is called hermitian, or self- 
adjoint, if H’ = H. Similarly, A € £(V) is called anti-hermitian if A’ = 
—A. 


Historical Notes 

Charles Hermite (1822-1901), one of the most eminent French mathematicians of the 
nineteenth century, was particularly distinguished for the clean elegance and high artistic 
quality of his work. As a student, he courted disaster by neglecting his routine assigned 
work to study the classic masters of mathematics; and though he nearly failed his exam- 
inations, he became a first-rate creative mathematician while still in his early twenties. 
In 1870 he was appointed to a professorship at the Sorbonne, where he trained a whole 
generation of well-known French mathematicians, including Picard, Borel, and Poincaré. 
The character of his mind is suggested by a remark of Poincaré: “Talk with M. Hermite. 
He never evokes a concrete image, yet you soon perceive that the most abstract entities 
are to him like living creatures.” He disliked geometry, but was strongly attracted to num- 
ber theory and analysis, and his favorite subject was elliptic functions, where these two 
fields touch in many remarkable ways. Earlier in the century the Norwegian genius Abel 
had proved that the general equation of the fifth degree cannot be solved by functions 
involving only rational operations and root extractions. One of Hermite’s most surprising 
achievements (in 1858) was to show that this equation can be solved by elliptic functions. 
His 1873 proof of the transcendence of e was another high point of his career. If he had 
been willing to dig even deeper into this vein, he could probably have disposed of z as 
well, but apparently he had enough of a good thing. As he wrote to a friend, “I shall risk 
nothing on an attempt to prove the transcendence of the number z. If others undertake 
this enterprise, no one will be happier than I at their success, but believe me, my dear 
friend, it will not fail to cost them some efforts.” As it turned out, Lindemann’s proof 
nine years later rested on extending Hermite’s method. 

Several of his purely mathematical discoveries had unexpected applications many years 
later to mathematical physics. For example, the Hermitian forms and matrices that he 
invented in connection with certain problems of number theory turned out to be crucial 
for Heisenberg’s 1925 formulation of quantum mechanics, and Hermite polynomials (see 
Chap. 8) are useful in solving Schrédinger’s wave equation. 


The following observations strengthen the above conjecture that conjuga- 
tion of complex numbers and hermitian conjugation of operators are some- 
how related. 


Definition 4.3.6 The expectation value (T),, of an operator T in the “‘state” 
|a) is acomplex number defined by (T), = (a|T|a). 


The complex conjugate of the expectation value is® 


(T)* = (alTla)* = (a|T'Ja). 


5Transcendental numbers are those that are not roots of polynomials with integer coeffi- 
cients. 


6When no risk of confusion exists, it is common to drop the subscript “a” and write (T) 
for the expectation value of T. 
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In words, T’, the hermitian conjugate of T, has an expectation value that 
is the complex conjugate of the latter’s expectation value. In particular, if T 
is hermitian—is equal to its hermitian conjugate—its expectation value will 
be real. 

What is the analogue of the known fact that a complex number is the sum 
of a real number and a pure imaginary one? The decomposition 


THS(T+T) + 5(T- TH aX+A 


shows that any operator can be written as a sum of a hermitian operator 
X= 5(T +T") and an anti-hermitian operator A = 5(T —T'). 

We can go even further, because any anti-hermitian operator A can be 
written as A = i(—iA) in which —iA is hermitian: (—iA)* = (—i)*A‘* = 
i(—A) = —iA. Denoting —iA by Y, we write T= X + /Y, where both X and 
Y are hermitian. This is the analogue of the decomposition z = x + iy in 
which both x and y are real. 

Clearly, we should expect some departures from a perfect correspon- 
dence. This is due to a lack of commutativity among operators. For instance, 
although the product of two real numbers is real, the product of two hermi- 
tian operators is not, in general, hermitian: 


(TU)' =U'T' =UTSTU. 


We have seen the relation between expectation values and conjugation prop- 
erties of operators. The following theorem completely characterizes hermi- 
tian operators in terms of their expectation values: 


Theorem 4.3.7 A linear map H on a complex inner product space is her- 
mitian if and only if (a\H|a) is real for all \a). 


Proof We have already pointed out that a hermitian operator has real expec- 
tation values. Conversely, assume that (a|H|qa) is real for all |a). Then 


(a|H|a) = (a|H|a)* = (a|H"|a) <= (aJH—H'|a)=0 Ya). 


By Theorem 2.3.8 we must have H — H' = 0. 


Example 4.3.8 In this example, we illustrate the result of the above the- 


orem with 2 x 2 matrices. The matrix H = (: a) is hermitian’ and acts 


on C?. Let us take an arbitrary vector |a) = (s) and evaluate (a|H|a). We 


have 
_ QO -i aj\ — (—ia2 
Hie) = (; 0) (et) = (Gar): 


7We assume that the reader has a casual familiarity with hermitian matrices. Think of 
an n X n matrix as a linear operator that acts on column vectors whose elements are 
components of vectors defined in the standard basis of C” or R”. A hermitian matrix then 
becomes a hermitian operator. 


4.3 Conjugation of Operators 


Therefore, 
alta) = (af 03) (Gi?) =—iagan + fazer 
1 
= iazay + (ioSa1)* = 2Re(ia5a1), 
and (a|H|qa) is real. 


For the most general 2 x 2 hermitian matrix H = ( Be ‘ is where @ and y 
are real, we have 


_ fa B\ far) ( aa + Bag 
Hel= (me 9) (2) = (pera) 


aa, + pa 
ete) = (0) gras yg) ee + Bon) +05(8"01 +722) 


and 


= aloy|? + af Barr + a3 B*ay + yar!" 


=alay|" + yla2|* + 2Re(at Bar). 
Again (a|H|a) is real. 


Definition 4.3.9 An operator A on an inner product space is called positive 
(written A > 0) if (a|A|a) > 0 for all |a) 4 |0). By Theorem 4.3.7 A is nec- 
essarily hermitian. We say A is strictly positive or positive definite (written 
A > 0) if (a|Ala) > 0 for all |a) ¥ |0). 


Theorem 4.3.10 A strictly positive operator T is invertible. 


Proof By Proposition 2.3.14, it is sufficient to prove that kerT = {|0)}. If 
|0) 4 |a) € kerT, then (a|T|a) = 0, contradicting the fact that T is strictly 
positive. 


Example 4.3.11 An example of a positive operator is the square of a her- 
mitian operator.® We note that for any hermitian operator H and any vector 
la), we have (a|H*|a) = (a|H*H|a) = (Ha | Ha) > 0 because of the positive 
definiteness of the inner product. 


From the discussion of the example above, we conclude that the square 
of an invertible hermitian operator is positive definite. 


8This is further evidence that hermitian operators are analogues of real numbers: The 
square of any real number is positive. 
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4.3.2. Unitary Operators 


The reader may be familiar with two- and three-dimensional rigid rotations 
and the fact that they preserve distances and the scalar product.’ Can this be 
generalized to complex inner product spaces? Let |a), |b) € V, and let U be 
an operator on V that preserves the scalar product; that is, given |b’) = U|b) 
and |a’) = Uja), then (a’ | b’) = (a | b). This yields 


(a’ | b’) = ((a|U")(U|b)) = (a|UTUIb) = (a | b) = (al 1b). 


Since this is true for arbitrary |a) and |b), we obtain U'U = 1. In the next 
chapter, when we introduce the concept of the determinant of operators, we 
shall see that this relation implies that U and U* are both invertible,!° with 
each one being the inverse of the other. 


Definition 4.3.12 Let V be a finite-dimensional inner product space. An 
operator U is called a unitary operator if U' = U~!. Unitary operators 


preserve the inner product of V. 


Example 4.3.13 The linear transformation T : C? > C? given by 


ary (a — ia2)//2 
Tl a2}= (a + iar — 203)/V/6 
3 {a1 — a2 +03 + i (a1 +02 + 03)}/V6 


is unitary. In fact, let 


ay pi 
la)= | a2} and |b)=] fo 
a3 Bs 


with dual vectors (a| = (af a} a3) and (b| = (BY BF 63), respectively. We 
use Eq. (4.11) and the procedure of Example 4.3.4 to find T’. The result is 


a3(1—i) 


am Bt pt 
a ia ia2 a3(1+i) 
T(@l=l ny, ve eC 
a3 202 4 a3(I=i) 
vo V6 
and we can verify that 
ay a} 
TT’ a}=— {1a 
a3 a3 


°We have also encountered isometries, which are more general than unitary operators. 
The word “unitary” is usually reserved for isometries on sesquilinear (hermitian) inner 
product spaces. 


!0This implication holds only for finite-dimensional vector spaces. 
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Thus TT’ = 1. Similarly, we can show that T'T = 1 and therefore that T is 
unitary. 


4.4  Idempotents 


We have already considered the decomposition of a vector space into sub- 
spaces in Sect. 2.1.3. We have also pointed out the significance of subspaces 
resulting from the fact that physics frequently takes place not inside the 
whole vector space, but in one of its subspaces. For instance, the study of 
projectile motion teaches us that it is very convenient to “project” the mo- 
tion onto the horizontal and vertical axes and to study these projections sep- 
arately. It is, therefore, appropriate to ask how we can go from a full space 
to one of its subspaces in the context of linear operators. 

Let us first consider a simple example. A point in the plane is designated 
by the coordinates (x, y). A subspace of the plane is the x-axis. Is there 
a linear operator,'! say P,, that acts on such a point and somehow sends 
it into that subspace? Of course, there are many operators from R? to R. 
However, we are looking for a specific one. We want P, to project the point 
onto the x-axis. Such an operator has to act on (x, y) and produce (x, 0): 
P..(x, y) = (x, 0). Therefore, if the point already lies on the x-axis, P,, does 
not change it. In particular, if we apply P,. twice, we get the same result as if 
we apply it only once. And this is true for any point in the plane. Therefore, 
our operator must have the property p2 = P,, 1.e., it must be an idempotent 
of the algebra End(V). 

Suppose that V is the direct sum of r of its subspaces: 


V=UNe--@U,=Du. 
i=1 


For any |v) € V, define P; by P;|v) =|v;), where |v;) is the component of 
|v) in U;. It is easy (and instructive) to prove that P; is a linear operator; 
Le., that P; ¢ End(V). Moreover, since P;|v;) = |v;) for all |vj) € Uj, we 
have 


Pj |v) =P P jv) =Pjlv;) =|vj) = Pjlv) 


for all |v) € V. It follows that Ps =P,,i., that P; is an idempotent. 
Next note that, for 7 #k, 


P jPx|v) = Pj|vx) =|0), 


because |vz) has no component in U;. Since this is true for all j 4 k and all 
|v) € V, we have 


P,P; =P,P; =0, 


'lWe want this operator to preserve the vector-space structure of the plane and the axis. 
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i.e., that the idempotents are orthogonal. Furthermore, since 


tin) =) = So) = Poh) = (SPs) 
j=l j=l j=l 


for all |v) € V, we must have aye ,P; = 1. We summarize the foregoing 
observation as 


Proposition 4.4.1 Let V be the direct sum of {U;};_, and let P; € £(V) be 
defined by P;|v) =|v,;) with |v;) the component of |v) in U;. Then {P;}i_, 
is a complete set of orthogonal idempotents, i.e., 


PP; =9,)P; (no sum) and ‘Y\P) =1. 
j=l 


If T ¢ End(V) and {P;}_, is a complete set of orthogonal idempotents 
corresponding to {U;}/_,, then T can be written as a sum of operators 
which restrict to the subspaces. More precisely, multiply the sum in Propo- 
sition 4.4.1 by T on the right to get 


T=)cPT= 37, T; =P,T (4.13) 


and note that T; € End(U;). 


4.4.1 Projection Operators 


When the vector space carries an inner product, it is useful to demand her- 
miticity for the idempotents: 


Definition 4.4.2 A hermitian idempotent of End(V) is called a projection 
operator. 


Consider two projection operators P; and Pz. We want to investigate con- 
ditions under which P; + P2 becomes a projection operator. By definition, 


P) +P. = (P) + Pz)” =P] +P) P2 + PoP; + P3 =P; +PiP2 + PoP) + Pr. 
So P; + P2 is a projection operator if and only if 
P,P, + P,P; = 0. (4.14) 
Multiply this on the left by P; to get 
P1P>+P)P2P;}=0 = P)P2+PiP>P; =0. 
Now multiply the same equation on the right by P; to get 


P|P>P;+P:P}=0 => P,P2P; +P2P; =0. 


44 \Idempotents 
These last two equations yield 
P,P, — PP; = 0. (4.15) 


The solution to Eqs. (4.14) and (4.15) is Pj]P2 = P2P; = 0. We therefore 
have the following result. 


Proposition 4.4.3 Let P),P2 € £(V) be projection operators. Then P; +P2 
is a projection operator if and only if P; and P2 are orthogonal. 


More generally, if there is a set {P;}/”_, of projection operators satisfying 


P; ifi=j, 
P;P; = en : 
0 fits, 
then P= )~”_, P; is also a projection operator. 
Given a normal vector |e), one can show easily that P = |e) (e| is a pro- 
jection operator: 


e Pis hermitian: P’ = (|e) (e|)" = ((el)(Je))* = |e) (el. 
e P equals its square: P* = (|e) (e|)(e) (e|) = |e) (ele) (e| = le) (el. 
—S— 
=! 
Let |y) be any nonzero vector in an inner product space V. The normal 


vector |éy) along |y) is |ey) = |y)//(y | y). From this, we construct the 
projection operator Py along |y): 


ly) (y| ly)(y| 


P, = ley){ 


oo sia oy IS) 


Given any vector |x), P,|x) is the component of |x) along |y). Hence, 


|x) —Py|x) or (1—Py)|x) 


is the component of |x) perpendicular to |y). The reflection of |x) in |y) is 
therefore (see Fig. 4.2) 


(y |x) 
(yl y) 
As shown in Fig. 4.2, from a two- or three-dimenstional geometric point of 


view, it is clear that the negative of this last vector is the reflection in the 
plane perpendicular to | y).!2 We generalize this to any vector space: 


Py|x) — (1 — Py)|x) = 2Py|x) — |x) = QPy — 1)|x) =2 ly) — |x). 


Definition 4.4.4 For any nonzero vector |y) the projection operator P, 
along |y) and the reflection operator R, in the plane perpendicular to | y) 


!2One can note more directly—also shown in Fig. 4.2—that in three-dimensional geom- 
etry, if one adds to |x) twice the negative of its projection on |y), one gets the reflection 
of |x) in the plane perpendicular to |y). 
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-(1 -P,)|x) (1 -P,)|x) 


reflection in ly) 
2 P,| x) 


plane perpendicular 
to | y) 


reflection in 
the plane 


Fig. 4.2 The vectors |x) and |y) and the reflections of |x) in |y) and in the plane perpen- 
dicular to |y) 


are given by 


p, = ol ang Ry =1—2p, 1-22 
(yly) (yly) 
For any other vector |x), the component |x)y of |x) along |y) and its reflec- 


tion |x),,y in the plane perpendicular to |y) are given by 


(y|x) (y|x) 
|x)y =Pylx) =——— ly) and |x)r,y = Rylx) = |x) — 2 ly). 
me ly) ape (yly) 
The relations |y)y = |y) and |y),,y = —|y) confirm our intuitive geomet- 


rical expectation. 


Example 4.4.5 Let V be a one-dimensional vector space. Let |a) be any 
nonzero vector in V. Any other vector |x) can be written as |x) = a|a) for 
some number a. Then 


_ layal, ,  (alx), | (ala) 
= x)= a)= 
(ala) (ala) (ala) 
Since this is true for all |a) and |x), we conclude that P, = 1 for any |y) in 


a one-dimensional vector space. The reflection operator can also be found. 
Therefore, 


P,=1 and R,»=-1 


in a one-dimensional vector space such as the complex vector space C and 
the real vector space R. 


We can take an orthonormal basis B = {le:)}e , and construct a set of 
projection operators {P; = |e;)(e; ee ,;- The operators P; are mutually or- 
thogonal. Thus, their sum eae P; is also a projection operator. 
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Proposition 4.4.6 Let B = {le;)} , be an orthonormal basis for Vy. Then 
the set {Pi = |e;) (ei besa consists of mutually orthogonal projection opera- 


tors, and ' 
completeness relation 


N N 
Yo Pi => leeil=1. 
i=] i=1 


This relation is called the completeness relation. 


Proof The proof is left as Problem 4.26. 


If we choose only the first m < N vectors, then the projection operator 
Pp”) = yey", lei) (ei| projects arbitrary vectors into the subspace spanned 
by the first m basis vectors {|e;) }/"_,. In other words, when P’”) acts on any 
vector |a) € V, the result will be a linear combination of only the first m 
vectors. The simple proof of this fact is left as an exercise. These points are 
illustrated in the following example. 


Example 4.4.7 Consider three orthonormal vectors {le:)}3_, € R? given by 


1 —1 


1 
1 
e — 1 _ eé — Sa —| > eée — irs 1 


The projection operators associated with each of these can be obtained by 
noting that (e;| is a row vector. Therefore, 


1 1 1 0 
1 1 
Pi=lei(eil=5|1 (1 1 05 110 
0 0 0 
Similarly, 
1 1 -l 2 
1 1 
Po == -1/(1 =1 =e f. “a 2 
2 2 =2 4 
and 
1 -1 1 1 -1 -l 
Ps=>| 1 (-1 1 ee -1 1 1 
-1 1 1 


Note that P; projects onto the line along |e;). This can be tested by let- 
ting P; act on an arbitrary vector and showing that the resulting vector is 
perpendicular to the other two vectors. For example, let Pz act on an arbi- 
trary column vector: 


x 1 1 -1l 2 x x—y+2z 
|a)=Po] y =— -1 1 -2))yf==+]-x+y—-2z 
v4 2 —-2 4 Zz 2x —2y+A4z 
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We verify that |a) is perpendicular to both |e;) and |e3): 


1 1 x—y+2z 
(eila)= —=(1 1 O)-| -x+y-2z]=0. 
v2 0 2x —2y+4z 


Similarly, (e3|a) = 0. So indeed, |a) is along |e2). 
We can find the operator that projects onto the plane formed by |e;) and 
|e2). This is 


1 
Pi+Po=3 1 2 -1 
1 -1l 2 


When this operator acts on an arbitrary column vector, it produces a vector 
lying in the plane of |e;) and |e2), or perpendicular to |e3): 


x 1 2 1 1 x 2x+y+z 
|b) = (P, + Po) | y a 1 2 -1 yJ=s=]x4+2y-z 
z 1-1 6 2 Zz x—y+2z 


It is easy to show that (e3|b) = 0. The operators that project onto the other 
two planes are obtained similarly. Finally, we verify easily that 


1 0 0 
P}+P2+P3={/0 1 O] =1, 
00 1 


i.e., that completeness relation holds. 


Example 4.4.8 We want to find the most general projection and reflection 
operators in a real two-dimensional vector space V. Without loss of gener- 
ality, we assume that V = IR2, and consider a vector 


ee 
ly) = Gy 


Then 
p Xx t (m A m min 
ne en (m1 m2) = 22 2 |- 
(yly) ong tnz \m nptns \mn2 15 
Let 
2: 
ul 5 = cos” a > a = -+cosa, 
m+ m +75 
and 
2 
a2 5 =sinra => Ee =+sina. 
n+ n3 2 2 


4.5 Representation of Algebras 


If the product 7,72 is negative, we can define a new angle which is the 
negative of the old angle. This will change the sign of 7172 and make it 
positive without changing the signs of ne and ee Thus the most general 
projection operator in R? is 


2 . 
COS” @ sinq@ cosa@ 
P, = ( : ) (4.16) 


sin a COSa@ sin~ a 


Now that we have the projection operator along |y), we can construct the 
reflection operator in a plane perpendicular to |y): 


1 O cosa sina cosa 
eae oe (( '} 1 Ce sina? ) 


_ 1—2cosa? —2sinacosa _ {-—cos2a —sin2a 
~\_%sinacosa 1—2sina? ) \—sin2a cos2a 


Defining ¢ = —2qa, we see that a general reflection is of the form 


R, — —cos@ sing 
~\ sing cosd) 


It is interesting to note that 


i = (<0 gi) —sin(by ) 
102 = \ sin(do — 1)  cos(¢2 — 1) ) 


This matrix describes a rotation of angle #2 — ¢ in R* (see Problem 5.9 in 
Chap. 5), which is clearly an isometry. We have just shown that the product 
of two reflections is an isometry. It turns out that this is a general property 
of isometries: they can always be expressed as a product of reflections (see 
Theorem 26.5.17). 


4.5 Representation of Algebras 


The operator algebra, i.e., the algebra £(V) of the endomorphisms of a vec- 
tor space, plays a significant role in physical applications as demonstrated so 
far in this chapter. These (abstract) operators take on a concrete (numerical) 
look once they are identified as matrices, the topic of Chap. 5. This sug- 
gests making an identification of any given algebra with a (sub)algebra of 
£(V), which subsequently could be identified as a collection of numbers— 
what physicists are after—constituting the rows and columns of matrices. 
The vague notion of “identification” is made precise by the concept of ho- 
momorphism of algebras. 

For the following definition, it is convenient to introduce some notation. 
Let both F and K denote either R or C with the condition that F C K. So, 
for instance, when F = R, then K can be either R and C; but when F = C, 
then K can be only C. If V is a vector space over K, we denote the algebra 


The product of two 
reflections in R2 isa 
rotation. 
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of its endomorphisms by £(V) or Endx(V). When there is no danger of 
confusion, we remove the subscript K. 


Definition 4.5.1 Let A be an associative algebra over F with identity 
1,4. A K-representation of A in a vector space V over K is a homo- 
morphism p : A > Endx(V) such that 0(14) = 1, where 1 is the unit 
operator in Endg(V). The representation p is said to be faithful if it 
is injective. 


Proposition 4.5.2 A nontrivial representation of a simple algebra is faith- 


ful. 


Proof Since any representation is a homomorphism, the proof follows im- 
mediately from Proposition 3.2.14. 


Example 4.5.3 Let A = H, the (real) algebra of the quaternions, 1.e., ele- 
ments of the form 


d= q0e0 + qiei + qre2 +9303, gi ER 


as given in Example 3.1.16. Let V = R*. Consider p : H — Endp(R?*), the 
real representation of quaternions given by 


x] GoX1 — Gix2 — G2Xx3 — q3X4 
_ x2] | G1X1 + Gox2 — 93x3 + G2Xx4 
Ty |x) =Ty = 
X3 2X1 + G3X2 + Gox3 — qix4 
x4 3X1 — q2x2 + Gix3 + Gox4 


where T, = p(q) € £(R*), and for convenience, we have written the ele- 
ment of R* as a column vector and introduced the matrix of qg’s. With this 
matrix, we can simply write 


qo —d1 —42 —43 
q™ qd —43 42 (4.17) 
q2 3 90 —41 
qa —G@2 WI qo 


pqQ= 


Using this matrix, it is straightforward, but slightly tedious, to show directly 
that o(gp) = p(qg)e(p). Hence, p is indeed a representation of H. However, 
instead, we calculate the matrices corresponding to the basis vectors of H. 
Since go = | and qi = q2 = g3 = 0 for ep, we get p(e9) = 1, as we should, 
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and as is evident from (4.17). Similarly, we can calculate the matrices of the 
other basis vectors. The results are given below 


1 0 0 0 0-10 0 
ped=|5 6 1 0 pe=|5 50 

0 0 1 OF’ 00 0-1]? 

0 0 0 1 0010 

0 O -l 0 00 0 -1 
p@d=|5 5 o of #@=loro o 

1 0 O Of’ 010 0 

0 -l 0O 0O 100 0 


It is now easier to check that p(ej;e;) = p(e;)o(e;) for i, 7 = 0, 1, 2,3, and 
hence, by Proposition 3.1.19, that is indeed a representation. 


Definition 4.5.4 A subspace W of V is called stable or invariant under 
a representation p : A > Endc(V) = End(V) if p(a)|w) is in W for all 
|w) € W and all ae A. A representation p is called irreducible if the only 
stable subspaces are W = V and W = {|0)y}. 


Problem 4.34 shows that if is surjective, then it is irreducible. 


Proposition 4.5.5 Let o : A — End(V) be a representation and |v) an ar- 
bitrary nonzero vector in V. Then 


We e(A)|v) = {|w) € V||w) = p(a)|v) for some a e A} 


is a stable subspace of V. In particular, p is irreducible if and only if 
p(A)|v) = V. 


Proof The straightforward but instructive proof is the content of Prob- 
lem 4.37. 


Isomorphic vector spaces are indistinguishable. So can be their corre- 
sponding representations. More precisely, 


Definition 4.5.6 Suppose T : V; = V2 is an isomorphism. Two rep- 
resentations o; : A — End(V,) and o2 : A > End(V2) are called 
equivalent if 


Topi(a)=p2(a)oT forallacA. 
We write 01 ~ p2 to indicate the equivalence of the two representa- 
tions. 
Sometimes the condition of equivalence is written as 


pi(a)=T !op(ayoT or pr(a)=Topji(a)oT!. (4.18) 
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Just as we can combine two vector spaces to get a new vector space, we 
can combine two representations to obtain a new representation. 


Definition 4.5.7 Let o and 7 be representations of A in U and V, respec- 
tively. We define representations p © n, called the direct sum, and p ® n, 
called the tensor product, of p and 7, respectively in U@ V and U @ V by 


(ep Bn)(a)=pl(ayPn(a) acd 
(9 @n)(a)=pl(ay@n(a) acA. 


It should be obvious that if p1 ~ p2 and n; ~ n2, then p1 Bi ~ p2 OB n2 
and pi ® m1 ~ p2 ® n2. 

Since an algebra A is also a vector space, it is possible to come up with 
representations of the form p :.A — End(A). When there is no danger of 
confusion, we designate members of A as a ket when they are considered 
simply as vectors, but use bold face type when the same member participates 
in an algebra multiplication. Thus |a) € A when the member is considered 
as a vector and a € A when the same member is one of the factors in a 
product. 


Definition 4.5.8 The regular representation of A in A is the repre- 
sentation py : A — End(A) given by pz, (a)|b) = ab. 


It is trivial to show that this is indeed a representation, i.e., that o, (ab) = 
pL(a)pz(b). 

If A is unital, then po, (a) = pz (a’) implies that pz (a)|1) = oz, (a’)|1) or 
that a1 = a’1, namely that a = a’, indicating that pz is injective, and the 
representation faithful, or op, (A) = A. 

px is simply the left-multiplication of A. What about the right-multipli- 
cation? If we set or(a)|b) = ba, then 


pr(ab)|c) = cab = (ca)b = (pp(a)|c))b = pr(b)(pr(a)|c)) 
= (pr(b)pr(a))|c). 


Hence, pr(ab) = pr(b)pr(a). Again if A is unital, then pp is faithful 
and pr(A) = A°?, where A°? is the algebra opposite to A given in Defi- 
nition 3.1.8. 


Theorem 4.5.9 Let £ be a minimal left ideal of an algebra A. Then the 
representation p“) : A > End(£L), the regular representation of A in L, 
given by 


p”)(a)|x) =pzy(a)|x)=ax foracAand|x)=xe£, 


is irreducible. 
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Proof We note that p\“)(A)|x) = Ax. Since £ is minimal, Ax = £ by 
Theorem 3.2.6, and p“)(A)|x) = ZL. By Proposition 4.5.5, p) is irre- 
ducible. 


Theorem 4.5.10 All irreducible representations of a simple algebra 
A are faithful and equivalent to p“), the regular representation of A 
in the minimal left ideal £. 


Proof The faithfulness is a consequence of Proposition 4.5.2. Let p: A —> 
End(V) be an irreducible representation. For x € £ and a vector |e) € V, let 
p(x)le) = |v). Then 


p(L)\e) = p(Ax)le) = p(A)p@)le) = e(A)|v) = V. (4.19) 


The first equality follows from Theorem 3.2.6, and the last equality from 
Proposition 4.5.5. 

Now consider a linear map T: £ > V given by T(y) = p(y)|e), and note 
that by Eq. (4.19), 


T(£) = p(L)le) = V. 


Therefore, T is surjective. Now let z be a nonzero member of L. If z € kerT, 
then by Theorem 3.2.6, £ = Az and 


T(L) = p(L)le) = p(Azyle) = p(A)p@)le) = p(AT@) = {0} 
which contradicts the previous equation. Therefore, kerT = {0} and T is 
injective, hence, bijective. 

To complete the proof, we have to show that 
Top)(a) = p(a)oT forallac A. 
If y € L, then the right-hand side gives 
p(a) o T(y) = p(a)p(y)le) = p(ay)|e) = Tay), 
while the left-hand side yields 


(To p“(a))y=T(o™ (ay) = Tay). 


This completes the proof. 


A consequence of this theorem is 


Corollary 4.5.11 All minimal left ideals of a simple algebra are isomor- 
phic. 


Proof If &' is another left ideal of A, then let V = £’ in Theorem 4.5.10. 
Then T of the theorem establishes an isomorphism between £ and L’. 
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Theorem 4.5.12 Two irreducible representations of a semi-simple algebra 
are equivalent if and only if they have the same kernel. 


Proof Recall from Theorem 3.5.25 that a semi-simple algebra A is the di- 
rect sum of simple algebras, each component being an ideal of A. Let 


, 
A=10h®---05,=QPiji 
i=1 


and p: A > End(V) be an irreducible representation. Assume there is 0 4 
Xp € Jp for some p and |e) € V such that |v) = p(x,)|e) 4 |0). Then since 
p is irreducible, by Proposition 4.5.5 


V= p(A)lv) = p(A)p (Xp) le) = p(AXp)le) S pPIp)le). 
But obviously, p(Jp)|e) C V. Hence, 
pple) =V, (4.20) 


which also indicates that any |x) € V can be written as |x) = p(y ple) for 
some y,, € Jp. Now since, JpJk = IxIp = {0} for k # p, we have 


PZ IX) = PA)OY Ie) = e(Zy,)le) = p(O)le) = |0) 


for all |x) € V. It follows that o(z;) is the zero operator, i.e., Zz, € ker p for 
all k ¥ p, or 


zt 
QB Ji Ckerp. 
ip 
Now let ply, : Jp > End(V) be the restriction of p to Jp, i.e., a represen- 
tation of Jp in V. Then T: J, > V given by T(Zp) = e(Zp)|e) is an iso- 
morphism by the proof of Theorem 4.5.10. Hence, p(z,) = 0 implies that 
Zp = 0, ie., that Jp M ker p = {0}. This yields 


7 
QB J; = ker p. 
Zn 

Let p; : A > End(V1) and p2 : A > End(V2) be two irreducible repre- 


sentations of a semi-simple algebra A. Assume further that 0; and p2 have 
the same kernel; i.e. that for some 1 < p <r, 


. 
ker p; = ker p2 = Di. 

i=l 
ifp 
Then as shown above there are isomorphisms Ty : Jp > V; and T2:J p> 
V2 given by 


TiZp)=piZp)le1), and T2(Zp) = p2(Zp)|e2) 


4.6 Problems 


with 
piGp)le1) =Vi and p2(Jp)|e2) = V2 (4.21) 


as in Eq. (4.20). The composite map $ = T2 o i maps V; isomorphically 
onto V2. We now show that 


Sopi(a)=p2(ayoS forallac A, 


and hence that p; ~ 2. Applying the right-hand side of this equation on a 
|v1) € V1, and noting that by (4.21) |v1) = p1(Zp)\e1) for some Zp € Ip, we 
get 


p2(a) 0 S|v1) = p2(a) 0 (T2 oT; ') pi (Zp)le1) = p2(a) oT2(T; | pi Zp)le1)) 
= (p2(a)) 0 T2(2p) = p2(a)p2(Zp)|e2) = p2(azp)|e2), 
while the left-hand side gives 
So pi (a)|v1) = (Tz oT; ') 1 (a) Zp)le1) = (T2 oT; ') pi (azp)le1) 
=T)(T,'p1(az,)le1)) =T2(azp) = p2(azp)Ie2). 


We have shown that if two irreducible representations of a semi-simple al- 
gebra have the same kernel, then they are equivalent. The converse is much 
easier to prove (see Problem 4.36). 


4.6 Problems 


4.1 Consider a linear operator T on a finite-dimensional vector space V. 


(a) Show that there exists a polynomial p such that p(T) = 0. Hint: Take 
a basis B = {lai}, and consider the vectors Tia for large 
enough M and conclude that there exists a polynomial p;(T) such 
that p1(T)|a1) = 0. Do the same for |a2), etc. Now take the product of 
all such polynomials. 

(b) From (a) conclude that for large enough n, T” can be written as a linear 
combination of smaller powers of T. 

(c) Now conclude that any infinite series in T collapses to a polynomial 
in T. 


4.2 Use mathematical induction to show that [A, A’’] = 0. 


4.3 For D and T defined in Example 2.3.5: 


(a) Show that [D, T] = 1. 
(b) Calculate the linear transformations D°T? and T°D?. 


4.4 Consider three linear operators L;,L2, and L3 satisfying the commuta- 
tion relations [L,, Lo] = ibs, [L3,L,] =ibo, [Lo,L3] =7L 1, and define the 
new operators Lz = Lj +ibo. 
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(a) Show that the operator = Li + L> + L; commutes with L;, k = 
1,2, 3. 

(b) Show that the set {L,,L_,L3} is closed under commutation, i.e., the 
commutator of any two of them can be written as a linear combination 
of the set. Determine these commutators. 

(c) Write L? in terms of Ly, L_, and L3. 


4.5 Prove the rest of Proposition 4.1.8. 
4.6 Show that if [[A, B], A] = 0, then for every positive integer k, 
[A*, B] = kA‘—'[A, B]. 


Hint: First prove the relation for low values of k; then use mathematical 
induction. 


4.7 Show that for D and T defined in Example 2.3.5, 
[D‘,T]=kD‘' and [T*,D] =—KT!. 


4.8 Evaluate the derivative of H~! (t) in terms of the derivative of H(t) by 
differentiating their product. 


4.9 Show that for any a, 6 € R and any H € End(V), we have 

oH BH _ p(a-+8)H- 
4.10 Show that (U + T)(U — T) = U? — T? if and only if [U, T] = 0. 
4.11 Prove that if A and B are hermitian, then i[A, B] is also hermitian. 


4.12 Find the solution to the operator differential equation 


du 
— =fHU(t). 
dt @ 


Hint: Make the change of variable y = ft? and use the result of Exam- 
ple 4.2.3. 


4.13 Verify that 
d dH dH dH 
—H? = ( — )H*+H( — JH+H?( — ). 
dt dt dt dt 


4.14 Show that if A and B commute, and f and g are arbitrary functions, 
then f(A) and g(B) also commute. 


4.15 Assuming that [[S, T], T] = 0 = [[S, T], S], show that 
[S, exp(tT) ] = r[S, T] exp(rT). 


Hint: Expand the exponential and use Problem 4.6. 
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4.16 Prove that 


exp(H; + Ho + H3) = exp(H1) exp(H2) exp(H3) 
1 
x exp| (UH H>] + [Hi, H3] + [H2, Ha) 


provided that H;, H2, and H3 commute with all the commutators. What is 
the generalization to Hj + H2+---+H,? 
4.17 Denoting the derivative of A(t) by A, show that 


d : ? 
a B] = [A, B] + [A, B]. 


4.18 Prove Theorem 4.3.2. Hint: Use Eq. (4.11) and Theorem 2.3.7. 

4.19 Let A(t) = exp(tH)Ag exp(—rH), where H and Ap are constant opera- 
tors. Show that dA/dt = [H, A(t)]. What happens when H commutes with 
A(t)? 


4.20 Let | f), |g) € C(a, b) with the additional property that 


f(a) = g(a) = f(b) = 8) =9. 


Show that for such functions, the derivative operator D is anti-hermitian. 
The inner product is defined as usual: 


b 
(flg) =/ f* ®g(t) dt. 


4.21 In this problem, you will go through the steps of proving the rigorous 
statement of the Heisenberg uncertainty principle. Denote the expectation 
(average) value of an operator A in a state |W) by Aayg. Thus, Aayg = (A) = 
(W|A|W). The uncertainty (deviation from the mean) in the normalized state 
|W) of the operator A is given by 


AA = | ((A — Aayg)?) =) (WIA — Aave 21). 


(a) Show that for any two hermitian operators A and B, we have 


|(W|ABIY) |” 


< (WAY) (BY). 
Hint: Apply the Schwarz inequality to an appropriate pair of vectors. 
(b) Using the above and the triangle inequality for complex numbers, 
show that 


(ITA, BIIY) |? 


<4 (WIA? |W) (WB? |). 


(c) Define the operators A’ = A — «1, B’ =B — £1, where a and £ are 
real numbers. Show that A’ and B’ are hermitian and [A’, B’] = [A, B]. 
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(d) Now use all the results above to show the celebrated uncertainty rela- 
tion 


1 
(AA)(AB) > S(t, B]|W) |. 


What does this reduce to for position operator x and momentum oper- 
ator p if [x, p] = 7h? 


4.22 Show that U = expA is unitary if A is anti-hermitian. Furthermore, if 
A commutes with A’, then expA is unitary. Hint: Use Proposition 4.2.4 on 


UU’ =1 and U'U=1 


4.23 Find T' for each of the following linear operators. 
(a) T:R?— R? given by 


(b) T:R*— R? given by 


x x+2y-—zZz 
T y = 3x —y+2z 
Zz —x+2y+3z 


(c) T:R? —> R? given by 
t(*)= xcos@ — ysin@ 
y) \xsiné + ycos@}’ 


where 6 is a real number. What is T'T? 
(d) T:C*— C? given by 


T ay\ (a, —iag 
a2) \ia,+az)’ 


(e) T:C3— C3 given by 


(| ay t+ia2 — 2103 
TI] a2 | = | —2ia, +02 +ia3 
a3 ia, — 2ia2 +03 


4.24 Show that if P is a (hermitian) projection operator, so are 1 — P and 
U'PU for any unitary operator U. 


4.25 For the vector 


4.6 Problems 


(a) _ find the associated projection matrix, Py. 
(b) Verify that P, does project an arbitrary vector in C4 along |a). 
(c) Verify directly that the matrix 1 — P, is also a projection operator. 


4.26 Prove Proposition 4.4.6 


4.27 Let |a}) =a) = (1, 1, —1) and |a) = ay = (2, 1, -1). 


(a) Construct (in the form of a matrix) the projection operators P; and P2 
that project onto the directions of |a,) and |az), respectively. Verify 
that they are indeed projection operators. 

(b) Construct (in the form of a matrix) the operator P = P; + P2 and verify 
directly that it is a projection operator. 

(c) Let P act on an arbitrary vector (x, y, z). What is the dot product of 
the resulting vector with the vector a; x a2? What can you say about 
P and your conclusion in (b)? 


4.28 Let P” = yr lez) (e:| be a projection operator constructed out of 
the first m orthonormal vectors of the basis B = {le;)} , of V. Show that 
P(”) projects into the subspace spanned by the first m vectors in B. 


4.29 What is the length of the projection of the vector (3, 4, —4) onto a line 
whose parametric equation is x = 2f+ 1, y= —t+3,z=¢t— 1? Hint: Find 
a unit vector in the direction of the line and construct its projection operator. 


4.30 The parametric equation of a line L in a coordinate system with origin 
O is 

x=2t+1, y=t+l, z=—2t+2. 
A point P has coordinates (3, —2, 1). 


(a) Using the projection operators, find the length of the projection of O P 
on the line L. 

(b) Find the vector whose beginning is P and ends perpendicularly on L. 

(c) From this vector calculate the distance from P to L. 


4.31 Let the operator U : C* > C? be given by 


Find Ut and test if U is unitary. 


4.32 Show that the product of two unitary operators is always unitary, but 
the product of two hermitian operators is hermitian if and only if they com- 
mute. 


4.33 Let S be an operator that is both unitary and hermitian. Show that 


(a) S$ is involutive (i.e., S7 = 1), and 
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(b) S=P*—P~, where P* and P~ are hermitian projection operators. 


4.34 Show that if a representation p : A — L(V) is surjective, then it is 
irreducible. Hint: The operator |a)(a| is in £(V) for any |a) € V. 


4.35 Show that p(e;e;) = p(e;)e(e;) fori, 7 =0, 1, 2,3 in Example 4.5.3. 


4.36 Show that any two equivalent representations of any algebra have the 
same kernel. 


4.37 To prove Proposition 4.5.5, first show that p(A)|v) is a subspace. Then 
prove that p(A)W Cc W. For the “only if” part of an irreducible representa- 
tion, take |v) to be in any subspace of V. 


Matrices 


So far, our theoretical investigation has been dealing mostly with abstract 
vectors and abstract operators. As we have seen in examples and problems, 
concrete representations of vectors and operators are necessary in most ap- 
plications. Such representations are obtained by choosing a basis and ex- 
pressing all operations in terms of components of vectors and matrix repre- 
sentations of operators. 


5.1 Representing Vectors and Operators 


Let us choose a basis By = {la;)}_, of a vector space Vy, and express an 
arbitrary vector |x) in this basis: |x) = + &;|a;). We write 


§j 
& 
ree (5.1) 


Ey 


and say that the column vector x represents |x) in By. We can also have a 
linear transformation A € £(Vy, W yy) act on the basis vectors in By to give 
vectors in the M-dimensional vector space Wy: |wx) = Ala;). The latter 
can be written as a linear combination of basis vectors Bw = {|b pie ; in 
Ww: 


M M M 
|wi) = Y > ajilbj). |w2) = Y\aj2|bj). wey WN) = Y-ajnlbj). 


j=l j=l j=l 


Note that the components have an extra subscript to denote which of the NV 
vectors {|wi)}™ , they are representing. The components can be arranged in 
a column as before to give a representation of the corresponding vectors: 


11 12 Q1N 
21 a22 QIN 
Wi = : ; W2 > . > tees WN = 
aM aM2 AMN 
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The operator itself is determined by the collection of all these vectors, i.e., 
by a matrix. We write this as 


11 12 Q1N 
21 22 Q2N 

he (5.2) 
OM1 M2 AOMN 


and call A the matrix representing A in bases By and By. This statement 
is also summarized symbolically as 


M 
Alay =" oalby)y: PHA coN. (5.3) 
jal 


We thus have the following rule: 


Box 5.1.1 To find the matrix A representing A in bases By = 


{lai}, and By = {lbj)¥jL1> express Ala;) as a linear combination 


of the vectors in By. The components form the ith column of A. 


Now consider the vector |y) = A|x) in Wy. This vector can be written 
in two ways: On the one hand, | y) = ey n;|b;). On the other hand, 


N N 
ly) =Alx) =A) éjla;) = )° &Alai) 


i=1 i=l 


= ye (> esi) = Y(e ta bj). 


yl fal Xie 


Since |y) has a unique set of components in the basis By, we conclude that 


N 
ni =>) ajiki, j=1,2,... 


M. (5.4) 
i=l 
This is written as 
n\ Oy, = ay2 Qin §y 
12 a2, 22... Qn 9) 
=|. . : . => y=AX, (5.5) 
1M M1 m2 aun) \én 


in which the usual matrix multiplication rule is understood. This matrix 
equation is the representation of the operator equation |y) = Ax) in the 
bases By and By. 

The construction above indicates that—once the bases are fixed in the 
two vector spaces—to every operator there corresponds a unique matrix. 
This uniqueness is the result of the uniqueness of the components of vectors 


5.1. Representing Vectors and Operators 


in a basis. On the other hand, given an M x N matrix A with elements a; ;, 
one can construct a unique linear operator T 4 defined by its action on the ba- 
sis vectors (see Box 2.3.6): T,4|a;) = ee a j;|b;). Thus, there is a one-to- 
one correspondence between operators and matrices. This correspondence 
is in fact a linear isomorphism: 


MXN are iso- 


Proposition 5.1.2 The two vector spaces £(Vy, Wy) and 
morphic. An explicit isomorphism is established only when a basis is chosen 
for each vector space, in which case, an operator is identified with its matrix 


representation. 


Example 5.1.3 In this example, we construct a matrix representation of the 
complex structure J on a real vector space V introduced in Sect. 2.4. There 
are two common representations, each corresponding to a different ordering 
of the vectors in the basis {|e;), J|e;)}"., of V. One ordering is to let Jle;) 
come right after |e;). The other is to collect all the J|e;) after the |e;) in the 
same order. We consider the first ordering in this example, and leave the 
other for the reader to construct. 

In the first ordering, for each |e;), we let |e;+1) = Jle;). Starting with 
|e1), we have 


Jje1) = |e2) =0- ei) + 1- lez) +0- |e3) +--- +0: leam), 


Jeo) =J"|e,) = —|e,) =—1- |e1) +0- Jez) +0- |e3) +--+» +0 |e2m). 


These two equations give the first two columns as 


0 -1 
1 0O 
0 0 
0 0 


For the third and fourth basis vectors, we get 


Jle3) = jeg) =0- ler) +0- ler) +0- Je3) + 1- Jeg) +--+ +0- eam) 


Jle4) =I? |e3) = —|e3) = 0- Jer) + 0- ler) — 1 - |e3) +--+ +0> leom), 
giving rise to the following third and fourth columns: 


0 

0 
-1 

0 


- OC O 
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It should now be clear that the matrix representation of J is of the form 


Ry; O.... O 
O Ro ... O 
0 O.... Rm 


where the zeros are the 2 x 2 zero matrices and Ry = e a for all k. 


Notation 5.1.4 Let A ¢ L(Vy, Wy). Choose a bases By for V and By 
for W. We denote the matrix representing A in these bases by May (A), 
where 


M3" :£(Vy, Wu) > MXN 


is the basis-dependent linear isomorphism. When V = W, we leave out the 
subscripts and superscripts of M, keeping in mind that all matrices are rep- 
resentations in a single basis. 


Given the linear transformations A: Vy — Wy and B: Wy > Ux, we 
can form the composite linear transformation BoA: Vy > Ux. We can also 
choose bases By = {|a;)}/_,, Bw = {lbi)}“,, Bu = {Ici) 4, for V, W, and 
U, respectively. Then A, B, and Bo A will be represented by an M x N,a 
K x M, anda K x N matrix, respectively, and we have 


M;” (BoA) =M;" (B)M," (A), (5.6) 


where on the right-hand side the product is defined as the usual product of 
matrices. If V = W = U, we write (5.6) as 


M(B o A) = M(B)M(A) (5.7) 


Matrices are determined entirely by their elements. For this reason a ma- 
trix A whose elements are 11, @12,... is Sometimes denoted by (a;;). Sim- 
ilarly, the elements of this matrix are denoted by (A);;. So, on the one hand, 
we have (a@;) = A, and on the other hand (A);; = ;;. In the context of this 
notation, therefore, we can write 


(A+ B)ij = (A)ij + (B)ij > (aij + Bij) = (@ij) + Bij), 
(VA)ij =v Aij > vy ij) = Vai), 
(0);; =9, 
(ij = 4ij- 


A matrix, as a representation of a linear operator, is well-defined only in 
reference to a specific basis. A collection of rows and columns of numbers 
by themselves have no operational meaning. When we manipulate matri- 
ces and attach meaning to them, we make an unannounced assumption re- 
garding the basis: We have the standard basis of C” (or R”) in mind. The 
following example should clarify this subtlety. 


5.1. Representing Vectors and Operators 


Example 5.1.5 Let us find the matrix representation of the linear operator 
Ac L(R°), given by 


x x—y+2z 
Aly|J= 3x —Z (5.8) 
Zz 2y+z 
in the basis 
1 1 
B=) |a1)= [1], laz)= [0], lax)=] 1 
0 1 1 


with the operator A. The following discussion will show that this is false. 
To obtain the first column of the matrix representing A, we note that 


Ala;)=A]1 : : : ; ie 
at — — =-— - — 
2 2 0 e 1 2 
as ) a ie] ) 
= 5 ak 972 5143? 
So, by Box 5.1.1, the first column of the matrix is 
1 
2 
_1 
2 
5 
2 
The other two columns are obtained from 
1 3 1 1 0 
Alaza) =A/O}=[2])=2]/1]4+]/0]+4+0] 1], 
1 1 0 1 1 
0 1 3 1 5 1 1 0 
Alaz)=A{1]=][{-1 =, 1 +5 0 ae 1], 
1 3 0 1 1 


giving the second and the third columns, respectively. The whole matrix is 
then 


NI NIN 
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As long as all vectors are represented by columns whose entries are ex- 
pansion coefficients of the vectors in B, A and A are indistinguishable. How- 


x 
ever, the action of A on the column vector (y) will not yield the RHS of 


Eq. (5.8)! Although this is not usually emphasized, the column vector on 
the LHS of Eq. (5.8) is really the vector 


1 0 0 
x{O}+y]1l]}]+z]0], 
0 1 


which is an expansion in terms of the standard basis of R? rather than in 
terms of B. : 
We can expand A( y) in terms of B, yielding 


x 
Aly]|= 3x —z 
z 


This says that in the basis B this vector has the representation 


3 
“ 2x —5y 
Aly =|—-x+ 5y+2z]. (5.9) 
11 B xt3y—z 
x 
Similarly, (y) is represented by 
1 1 1 
- ger gy 98 
y}] =] gx—syt4z |. (5.10) 
777 Rp \olxtly+lhy 


Applying A to the RHS of (5.10) yields the RHS of (5.9), as it should. 


5.2. Operations on Matrices 


There are two basic operations that one can perform on a matrix to obtain 


transpose of a matrix a new one; these are transposition and complex conjugation. The transpose 


5.2 Operations on Matrices 


of an M x N matrix Ais an N x M matrix A’ obtained by interchanging the 
rows and columns of A: 


(A) =(A)ji, or (aij)’ = (@;i). (5.11) 


The following theorem, whose proof follows immediately from the defi- 
nition of transpose, summarizes the important properties of the operation of 
transposition. 


Theorem 5.2.1 Let A and B be two matrices for which the operation of 
addition and/or multiplication are defined. Then 


(a) (A+B) =A‘’+B/, 
(b) (AB)! = ‘i 
(Cc) A) = 


Let Te £(V, W) and By = {lai)}™_, and Bw = {|b; rae bases in V 
and W. Then 
M 
T\a;) = 9) jilb;), 
where T = Ma" (T). Let T* € L(W*, V*) be the pull-back of T and BY = 
{Ox}, and By = {oi}, bases dual to By and By. Then 


N 


To = D(T*) Oe 


k=1 


Apply both sides of this equation to |a;) to get 


LHS = (T*$7)|a;) = 1(Tla;)) 


=61 
M M pee lee 
=¢1 (dm sl = 01) ji ¢1(Ibj)) = ii 
fad. jal 
and 
N N 
RHS = ) \(T*), O¢lai) = 9 (T*) 541 = (T*),p- 
k=1 k=1 


Comparing the last two equations, we have 


Proposition 5.2.2 LetT ¢ L(V, W) and By and By be bases in V and W. 
Let T*, By, and By, be duals to T, By, and By, respectively. Let T= 


M5" (T) and T* =M5! (T*). Then T* =T". 
Ww 


Of special interest is a matrix that is equal to either its transpose or the 
negative of its transpose. Such matrices occur frequently in physics. 
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Definition 5.2.3 A matrix S is symmetric if S’ = S. Similarly, a ma- 
trix A is antisymmetric if A’ = —A. 


Any matrix A can be written as A= 5(A +A) + 5(A — A‘), where the 
first term is symmetric and the second is antisymmetric. 

The elements of a symmetric matrix A satisfy the relation aj; = (A')ij = 
(A)ij = jj; 1.e., the matrix is symmetric under reflection through the main 
diagonal. On the other hand, for an antisymmetric matrix we have aj; = 
—aj;. In particular, the diagonal elements of an antisymmetric matrix are all 
zero. 

A (real) matrix satisfying A’A = AA‘ = 1 is called orthogonal. 

Complex conjugation is an operation under which all elements of a ma- 
trix are complex conjugated. Denoting the complex conjugate of A by A*, 
we have (A*);; = (A)jj> or (a@j;)* = (ai). A matrix is real if and only if 
A* = A. Clearly, (A*)* =A. 

Under the combined operation of complex conjugation and transposition, 
the rows and columns of a matrix are interchanged and all of its elements 
are complex conjugated. This combined operation is called the adjoint op- 
eration, or hermitian conjugation, and is denoted by +, as with operators. 
Thus, we have 


aia (ay= (ay 


(5.12) 
(4‘),, = Aji or (ai;)' = (a%,). 


Two types of matrices are important enough to warrant a separate definition. 


Definition 5.2.4 A hermitian matrix H satisfies H’ = H, or, in terms of 
elements, Nii = nji- A unitary matrix U satisfies U*U = UU" = 1, or, in 


N N 
terms of elements, )>;_, Mik = at Mba = 5ij- 


Remarks It follows immediately from this definition that 


1. The diagonal elements of a hermitian matrix are real. 

The kth column of a hermitian matrix is the complex conjugate of its 

kth row, and vice versa. 

A real hermitian matrix is symmetric. 

4. The rows of an N x N unitary matrix, when considered as vectors in 
CN , form an orthonormal set, as do the columns. 

5. Areal unitary matrix is orthogonal. 


ae 


It is sometimes possible (and desirable) to transform a matrix into a form 
in which all of its off-diagonal elements are zero. Such a matrix is called a 
diagonal matrix. 


Box 5.2.5 A diagonal matrix whose diagonal elements are ee is 
denoted by diag(A1, 42, ..., An). 


5.2 Operations on Matrices 


Example 5.2.6 In this example, we derive a useful identity for functions of 
a diagonal matrix. Let D = diag(A1, A2,..., An) be a diagonal matrix, and 
f(x) a function that has a Taylor series expansion f(x) = >? axx*. The 
same function of D can be written as 


f(D) =) ayD* = Y ag[diag(a1, A2,...,4n) J" 


k=0 k=0 
[o,@) 
=) ag diag(At, AS,..., 4%) 
k=0 
[o,@) CO [o,@) 
= tne ant. Sans, ae > wi 
k=0 k=0 k=0 


= diag(f(A1), f(A2),.-., fAn)). 


In words, the function of a diagonal matrix is equal to a diagonal matrix 
whose entries are the same function of the corresponding entries of the orig- 
inal matrix. In the above derivation, we used the following obvious proper- 
ties of diagonal matrices: 


adiag(A1,A2,...,An) = diag(aAj, dA2,...,aAn), 
diag(A1, A2,..-,An) + diag(@1, w2,..-, @n) 


= diag(A; + @1,...,An + @n), 


diag(\1, X2, Riera An) : diag(a@1, W2,.---+5 @n) = diag(41@1, Sora An@n)- 


Example 5.2.7 In this example, we list some familiar matrices in physics. 


(a) A prototypical symmetric matrix is that of the moment of inertia en- 
countered in mechanics. The ijth element of this matrix is defined as 
Ii; = fff (x1, x2, x3)xixj dV, where x; is the ith Cartesian coordi- 
nate of a point in the distribution of mass described by the volume 
density p(x,, x2, x3). It is clear that J;; = 1j;, or |= I’. The moment 
of inertia matrix can be represented as 


yy ft fe 
l=]f2 Io I 
3 193 33 


It has six independent elements. 
(b) An example of an antisymmetric matrix is the electromagnetic field 
tensor given by 
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(c) Examples of hermitian matrices are the 2 x 2 Pauli spin matrices: 


_(0 1 _(0 =i fl 0 
aN Oe VEN Ge = NG. a) 


(d) The most frequently encountered orthogonal matrices are rotations. 
One such matrix represents the rotation of a 3-dimensional rigid body 
in terms of Euler angles and is used in mechanics. Attaching a coor- 
dinate system to the body, a general rotation can be decomposed into 
a rotation of angle g about the z-axis, followed by a rotation of angle 
@ about the new x-axis, followed by a rotation of angle y about the 
new z-axis. We simply exhibit this matrix in terms of these angles and 
leave it to the reader to show that it is indeed orthogonal. 

(snc —sinwcos@sing -—coswsing —sinwcos@cosg sin w sind 
sinycosy+cosywcos@sing —sinwsing+cosycos@cosg —cosysind 
sind sing sin 6 cos cos@ 


5.3 Orthonormal Bases 


The matrix representation of A € End(V) is facilitated by choosing an or- 
thonormal basis B = {le;)}® . The matrix elements of A can be found in 
such a basis by “multiplying” — sides of Ale;) = = 5 pa1 “ki lex) on the left 
by (e;|: 


N 
sisley=(el( Soaule)=3 Yau (ejlek) = a2, 
k=1 
= 
or 
(A)ij = aij = (ei|Ale;)- (5.13) 


We can also show that in an orthonormal basis, the ith component &; of 
a vector is found by multiplying the vector by (e;|. This expression for &; 
allows us to write the expansion of |x) as 


N 


N 
Ix) = > (ejlx) les) =D Mejlx) = T= > les(ejl, (5.14) 

j=l j=l 
J 


which is the same as in Proposition 4.4.6. 

Let us now investigate the representation of the special operators dis- 
cussed in Chap. 4 and find the connection between those operators and the 
matrices encountered in the last section. We begin by calculating the matrix 
representing the hermitian conjugate of an operator T. In an orthonormal 
basis, the elements of this matrix are given by Eq. (5.13), tij = (e;|Tle;). 
Taking the complex conjugate of this equation and using the definition of T’ 
given in Eq. (4.11), we obtain 


ri = (eilTlej)*=(ejlT lei), or (T),, = Tf. 


5.3. Orthonormal Bases 


This is precisely how the adjoint of a matrix was defined. Note how cru- 
cially this conclusion depends on the orthonormality of the basis vectors. If 
the basis were not orthonormal, we could not use Eq. (5.13) on which the 
conclusion is based. Therefore, 


Box 5.3.1 Only in an orthonormal basis is the adjoint of an operator 
represented by the adjoint of the matrix representing that operator. 


In particular, a hermitian operator is represented by a hermitian matrix 
only if an orthonormal basis is used. The following example illustrates this 
point. 


Example 5.3.2 Consider the matrix representation of the hermitian opera- 


tor H in a general—not orthonormal—basis B = {lai} ,- The elements of 
the matrix corresponding to H are given by 


N N 
Hlax) =) njxlaj), or Haj) = > njilaj). (5.15) 
j=l j=l 


Taking the product of the first equation with (a;| and complex-conjugating 
the result gives 


N * N 
(a;|H|ax)* = (> nate? = 50%, (ajlai). 
j=l j=! 


But by the definition of a hermitian operator, 
(a;|H|ax)* = (ag|H"|a;) = (ax|H|ai). 
So we have (ax|Hlai) = 71 0%; (ajlai). 


On the other hand, multiplying the second equation in (5.15) by (a,| 
gives 


N 
(ax|H|a;) = ¥ > nji(axlaj). 
j=l 


The only conclusion we can draw from this discussion is 


N N 
S> ni, (ajlai) = )> nji(aclaj). 
j=l j=l 


Because this equation does not say anything about each individual 7;;, 
we cannot conclude, in general, that n*, = nj;;. However, if the |a;)’s 
are orthonormal, then (a;|a;) = 5;; and (az|a;) = 6,;, and we obtain 
4 14 9ji = ae njidkj, OF Ni = Nki, aS expected of a hermitian ma- 
trix,” , 
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Similarly, we expect the matrices representing unitary operators to be 
unitary only if the basis is orthonormal. This is an immediate consequence 
of Eq. (5.12), but we shall prove it in order to provide yet another example 
of how the completeness relation, Eq. (5.14), is used. Since UU’ = 1, we 
have 


(e;|UU" |e ;) = (e;|1]e;) = (er lej) = 4). 


We insert the completeness relation 1 = eS lex) (ex| between U and U" 
on the LHS: 


N N 
aps jane) = J 5 (ei|Ulex) (ex|U"|e;) = 84). 
k=! Kal Swix =n", 
=, 


This equation gives the first half of the requirement for a unitary matrix 
given in Definition 5.2.4. By redoing the calculation for U'U, we could ob- 
tain the second half of that requirement. 


5.4 Change of Basis 


It is often advantageous to describe a physical problem in a particular basis 
because it takes a simpler form there, but the general form of the result may 
still be of importance. In such cases the problem is solved in one basis, and 
the result is transformed to other bases. Let us investigate this point in some 
detail. 

Given a basis B = {la;)}_ ;» we can write an arbitrary vector |a) with 
components {ai}, in B as ja) = ee aj|a;). Now suppose that we 
change the basis to B’ = (la)¥ How are the components of |a) in B’ 
related to those in B? To answer this question, we write |a;) in terms of B’ 
vectors, 


N 
way Gia PHN, 
j=l 


which can also be abbreviated as 


|a1) Pll P21 os: PNI lai) 
|a2) Pl2 22 -** pn2 |a}) 

: = ‘ : : : ‘ (5.16) 
lan) Pin p2n -** pnn/ \lay) 


In this notation, we also have 


la) = (a a2... ay) =a ; 


lan) lan) 


5.4 Change of Basis 


where a is the column representation of |a) in B. Now multiply both sides 
of (5.16) by a’ to get 


|a1) la’) la’) 
|a2) ay) 
la) =a’ . | =aR . |= (oy Gh vex Wy) 


; =ait 
lan) lay) ~" lay) 


where R is the transpose of the N x N matrix of Eq. (5.16), and the last 
equality expresses |a) in B’. We therefore conclude that 


al = aR’, 


where a’ designates a column vector with elements al’. , the components of 
|a) in B’. Taking the transpose of the last equation elds 


a} Pll P12 PIN eal 

j Ot} P21 P22 «+++ P2N a2 

a =Ra or a | . . . NS (5.17) 
ay PN1 PN2 +++ PNN an 


which in component form can be written as 


N 


a= > oy for jal, 2.205N; (5.18) 
i=1 


The matrix R is called the basis transformation matrix. It is invertible 
because it is a linear transformation that maps one basis onto another (see 
Proposition 4.1.3). 

What happens to a matrix representation of an operator when we trans- 
form the basis? Consider the equation |b) = Ala), where |a) and |b) have 
components {a} “1 and {Bi} 1» respectively, in B. This equation has a 
corresponding matrix equation b = Aa. Now, if we change the basis, the 
columns of the components of |a) and |b) will change to those of a’ and b’, 
respectively. We seek a matrix A’ such that b! = A’a’. This matrix will be 
the transform of A. Using Eq. (5.17), we write Rb = A’Ra, or b = R7!A’Ra. 
Comparing this with b = Aa and applying the fact that both equations hold 
for arbitrary a and b, we conclude that 


RA’R=A, or A’=RAR!. (5.19) 


This is called a similarity transformation on A, and A’ is said to be similar 
to A. 

The transformation matrix R can easily be found for orthonormal bases 
B = {|e;)}%_, and B’ = {le/)}¥_,. We have |e:) = )°h, pxile;). Multiply- 
ing this equation by (e, |, we obtain 


N 
(e, |e) = You e' |e) = D> peidjx = pji- (5.20) 
k=1 
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That is, 


Box 5.4.1 To find the ijth element of the matrix that changes the 
components of a vector in the orthonormal basis B to those in the 
orthonormal basis B’, take the jth ket in B and multiply it by the ith 
bra in B’. 


To find the ith element of the matrix that changes B’ into B, we take the 
jth ket in B’ and multiply it by the ith bra in B: p;; = (e;|e’;). However, the 
matrix R’ must be R~!, as can be seen from Eq. (5.17). On the other hand, 
(0;,)" = (eile’,)* = (elei) = pji, oF 


—1\* _ - —| _ — + 
(R dij =pji, or (R dij =p = (R iz (3.21) 
This shows that R is a unitary matrix and yields an important result. 


Theorem 5.4.2 The matrix that transforms one orthonormal basis into an- 
other is necessarily unitary. 


From Egg. (5.20) and (5.21) we have (R")ij = (e; le’). Thus, 


Box 5.4.3 To obtain the jth column of R', we take the jth vec- 
tor in the new basis and successively “multiply” it by (e;| for i = 
Le eee Ne 


In particular, if the original basis is the standard basis of C’ and le’) is 


represented by a column vector in that basis, then the jth column of R’ is 
simply the vector |e’). 


Example 5.4.4 In this example, we show that the similarity transform of a 
function of a matrix is the same function of the similarity transform of the 
matrix: 


Rf(A)R~' = f (RAR). 


The proof involves inserting 1 = R~'R between factors of A in the Taylor 
series expansion of f(A): 


6S be 66 k times 
——— 
Rf(AR = a( Soa R= Y\axRA‘R™! = YS axRAA- -- ART! 
k=0 k=0 k=0 
k times 
lo.@) ——————————— [o.@) 
=a, RAR~'RAR™!...RAR~! = ‘a,(RARW)‘ 
k=0 k=0 
= f(RAR“'). 


This completes the proof. 


5.5 Determinant of a Matrix 
5.5 Determinant of a Matrix 


An important concept associated with linear operators is the determinant, 
which we have already discussed in Sect. 2.6.1. Determinants are also 
defined for matrices. If A is representing A in some basis, then we set 
det A = detA. That this relation is basis-independent is, of course, obvious 
from Definition 2.6.10 and the discussion preceding it. However, it can also 
be shown directly, as we shall do later in this chapter. 

Let A be a linear operator on V. Let {lex} , be a basis of V in which A 
is represented by A. Then the left-hand side of Eq. (2.32) becomes 


N N 
LHS = A(Ale1),..., Alew)) =a( Sanat > cnn) 


ij=1 in=l 


N 
= = Gi, --.OiynA(lei,), -- + lezy)) 


iy...iy=l1 


= Yancy .. On ny A (lex(y), ++ +5 lexcy)) 
8 

- Yenc ..On(nynéx * A(le1),...,1en)), 
8 


where zr is the permutation taking k to ix. The right-hand side of Eq. (2.32) 
is just the product of detA and A(le)1,..., |e)). Hence, 


N 
detA = detA=) exancyi-.-drwyw = > ex] [Amie (5.22) 
k=1 


Tv us 
Since 7 (k) = ix, the product in the sum can be written as 


N 


N N N 
[ [Mie =T[Max-ta = [Mee = TA). 
k=1 k=1 k=1 


k=1 


where the second equality follows because we can commute the numbers 
until (A)1,,-1(,) becomes the first term of the product, (A)7,-1(2), the second 
term, and so on. Substituting this in (5.22) and noting that )°, = )>-1 and 
€,,-1 = 7, we have 


Theorem 5.5.1 Let A € £(V) and A its representation in any basis of V. 
Then 


N 
detA = detA=S er [ [Aaa = >> birn.niy Aint --- Ain 


x j=l id 
N 
det A = det A’ = Yo ex [[Ajinn = 2 Eizi2...iy (A) Li, «(A Nix 
TU j=) i,...1N 


where €j;i7...iy 18 the symbol introduced in (2.29). In particular, det A’ = 
det A. 
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Let A be any N x N matrix. Let |v;) € RN be the jth column of A. Define 
the linear operator A € End(R”) by 


Alej)=|vj), f=... N, (5.23) 


where (ee , is the standard basis of IR. Then A is the matrix repre- 
senting A in the standard basis. Now let A be a determinant function in RY 
whose value is one at the standard basis. Then 


A(lv1),...,]uv)) =A(Ale1),..., Alew)) 
=detA- A(le1),...,|ev)) =detA 


and, therefore, 
detA = A(|v1),..., |vv)). (5.24) 


If instead of columns, we use rows |u;), we obtain detA’ = A(|u1),..., 
|uy)). Since A is a multilinear skew-symmetric function, and det A‘ = det A, 
we have the following familiar theorem. 


Theorem 5.5.2 Let A be a square matrix. Then 


1. detA is linear with respect to any row or column vector of A. 

2. If any two rows or two columns of A are interchanged, det A changes 
sign, 

3. Adding a multiple of one row (column) of A to another row (column) of 
A does not change detA. 

4. detA=0 iff the rows (columns) are linearly dependent. 


5.5.1 Matrix of the Classical Adjoint 


Since by Corollary 2.6.13, the classical adjoint of A is essentially the inverse 
of A, we expect its matrix representation to be essentially the inverse of the 
matrix of A. To find this matrix, choose a basis {|e mee , which evaluates 
the determinant function of Eq. (2.33) to 1. Then ad(A)|e;) = cjile;), with 
cji forming the representation matrix of ad(A). Thus, substituting |e;) for 
|v) on both sides of (2.33) and using the fact that {|e mee , are linearly 
independent, we get 


(-1)/-!A(le;), Ale1),...,Alej),..., Alew)) =cji 


or 


N 


N 
cj coi ta(e » GilleanAle)uw > uote] 


kj=1 kn=1 


=(-DITT YO Aa --- Aen A (lei), leks). leew) 


ky..ky 


=(-1I1 YO Aaa... Mew n€iky...kw- 


ky...kw 
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The product in the sum does not include (A) kjj- This means that the entire 
jth column is missing in the product. Furthermore, because of the skew- 
symmetry of €;x,..ky, none of the k,,’s can be i, and since kj»’s label the 
rows, the ith row is also absent in the sum. Now move i from the first lo- 
cation to the ith location. This will introduce a factor of (—1)'~! due to 
the i — 1 exchanges of indices. Inserting all this information in the previous 
equation, we obtain 


aC Y. Anis Onn tidty: (5.25) 
ky...kny 


Now note that the sum is a determinant of an (N — 1) x (N — 1) matrix 
obtained from A by eliminating its ith row and jth column. This determi- 
nant is called a minor of order N — 1 and denoted by M;;. The product 
(—1)'t/ M;; is called the cofactor of (A);;, and denoted by (cof A)j;. 

With this and another obvious notation, (5.25) becomes 


(ad A) ji = ji = (—1)'T Mij = (CofA);;- (5.26) 


With the matrix of the adjoint at our disposal, we can write Eq. (2.34) in the 
matrix form. Doing so, and taking the ikth element of all sides, we get 


N N 
Yo ad(A)ij (A) jx = det A - 83x =) \(A)ij ad(A) jx- 
j=l j=l 


Setting k =i yields 


N N 
detA = 5° ad(A)ij(A) ji = ) (A); ad(A) ji 


j=l j=! 


or, using (5.26), 


N N 
detA = 5 °(A) ji(cofA) ji = ) > (A)ij (CofA) ;;. (5.27) 
j=l j=l 


This is the familiar expansion of a determinant by its ith column or ith row. 


Historical Notes 

Vandermonde, Alexandre-Thiéophile, also known as Alexis, Abnit, and Charles- 
Auguste Vandermonde (1735-1796) had a father, a physician who directed his sickly 
son toward a musical career. An acquaintanceship with Fontaine, however, so stimulated 
Vandermonde that in 1771 he was elected to the Académie des Sciences, to which he pre- 
sented four mathematical papers (his total mathematical production) in 1771-1772. Later, 
Vandermonde wrote several papers on harmony, and it was said at that time that musicians 
considered Vandermonde to be a mathematician and that mathematicians viewed him as 
a musician. 

Vandermonde’s membership in the Academy led to a paper on experiments with cold, 
made with Bezout and Lavoisier in 1776, and a paper on the manufacture of steel with 
Berthollet and Monge in 1786. Vandermonde became an ardent and active revolutionary, 
being such a close friend of Monge that he was termed “femme de Monge”. He was a 
member of the Commune of Paris and the club of the Jacobins. In 1782 he was director of 


153 


minor of order N — 1 
cofactor of an element of 
a matrix 


154 


O(n) and SO(n) 


5 Matrices 


the Conservatoire des Arts et Métiers and in 1792, chief of the Bureau de 1’ Habillement 
des Armies. He joined in the design of a course in political economy for the Ecole Nor- 
male and in 1795 was named a member of the Institut National. 

Vandermonde is best known for the theory of determinants. Lebesgue believed that the 
attribution of determinant to Vandermonde was due to a misreading of his notation. Nev- 
ertheless, Vandermonde’s fourth paper was the first to give a connected exposition of 
determinants, because he (1) defined a contemporary symbolism that was more com- 
plete, simple, and appropriate than that of Leibniz; (2) defined determinants as functions 
apart from the solution of linear equations presented by Cramer but also treated by Van- 
dermonde; and (3) gave a number of properties of these functions, such as the number 
and signs of the terms and the effect of interchanging two consecutive indices (rows or 
columns), which he used to show that a determinant is zero if two rows or columns are 
identical. 

Vandermonde’s real and unrecognized claim to fame was lodged in his first paper, in 
which he approached the general problem of the solvability of algebraic equations through 
a study of functions invariant under permutations of the roots of the equations. Cauchy 
assigned priority in this to Lagrange and Vandermonde. Vandermonde read his paper in 
November 1770, but he did not become a member of the Academy until 1771, and the pa- 
per was not published until 1774. Although Vandermonde’s methods were close to those 
later developed by Abel and Galois for testing the solvability of equations, and although 
his treatment of the binomial equation x” — 1 = 0 could easily have led to the anticipation 
of Gauss’s results on constructible polygons, Vandermonde himself did not rigorously or 
completely establish his results, nor did he see the implications for geometry. Neverthe- 
less, Kronecker dates the modern movement in algebra to Vandermonde’s 1770 paper. 
Unfortunately, Vandermonde’s spurt of enthusiasm and creativity, which in two years 
produced four insightful mathematical papers at least two of which were of substantial 
importance, was quickly diverted by the exciting politics of the time and perhaps by poor 
health. 


Example 5.5.3 Let O and U denote, respectively, an orthogonal and a uni- 
tary n x n matrix; that is, OO’ = OO = 1, and UU‘ = U'U = 1. Taking the 
determinant of the first equation and using Theorems 2.6.11 (with A = 1) 
and 5.5.1, we obtain 


(det 0) (det O') = (det 0)” = det 1 = 1. 


Therefore, for an orthogonal matrix, we get detO = +1. 

Orthogonal transformations preserve a real inner product. Among such 
transformations are the so-called inversions, which, in their simplest form, 
multiply a vector by —1. In three dimensions this corresponds to a reflection 
through the origin. The matrix associated with this operation is —1: 


x —x -1 0 0 x 
yJ>{-y}]=] 0 -!1 O y], 
Zz =< 0 0 -!l Z 


which has a determinant of —1. This is a prototype of other, more compli- 
cated, orthogonal transformations whose determinants are —1. The set of 
orthogonal matrices in n dimensions is denoted by O(n). 

The other orthogonal transformations, whose determinants are +1, are 
of special interest because they correspond to rotations in three dimensions. 
The set of orthogonal matrices in n dimensions having determinant +1 is 
denoted by SO(n). These matrices are special because they have the math- 
ematical structure of a (continuous) group, which finds application in many 
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areas of advanced physics. We shall come back to the topic of group theory 
later in the book. 

We can obtain a similar result for unitary transformations. We take the 
determinant of both sides of UU = 1: 


det(U*)’ det U = det U* det U = (det U)* (det U) = |detU|* = 1. 


Thus, we can generally write detU = e'*, with a € R. The set of unitary 
matrices in n dimensions is denoted by U(n). The set of those matrices with 
a = 0 forms a group to which 1 belongs and that is denoted by SU(n). This 
group has found applications in the description of fundamental forces and 
the dynamics of fundamental particles. 


5.5.2. Inverse of a Matrix 


Equation (5.26) shows that the matrix of the classical adjoint is the transpose 
of the cofactor matrix. Using this, and writing (2.34) in matrix form yields 


(cof A)'A = detA- 1=A(cofA)’. 
Therefore, we have 


Theorem 5.5.4 The matrix A has an inverse if and only if detA 4 0. Fur- 
thermore, 


act = ofA! (5.28) 
~~  detA ” ; 


This is the matrix form of the operator equation in Corollary 2.6.13. 


Example 5.5.5 The inverse of a 2 x 2 matrix is easily found: 
-1 
a b 1 d -—b 
= a) 
(: ; ad — bc (4 a ) O27) 


We defined the determinant of an operator intrinsically, i.e., independent 
of a basis. We have also connected this intrinsic property to the determinant 
of the matrix representing that operator in some basis. We can now show di- 
rectly that the matrices representing an operator in two arbitrary bases have 
the same determinant. We leave this as exercise for the reader in Problem 
5.23: 


if ad — bce £0. 


Algorithm for Calculating the Inverse of a Matrix 

There is a more practical way of calculating the inverse of matrices. In the 
following discussion of this method, we shall confine ourselves simply to 
stating a couple of definitions and the main theorem, with no attempt at 
providing any proofs. The practical utility of the method will be illustrated 
by a detailed analysis of examples. 
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Definition 5.5.6 An elementary row operation on a matrix is one of the 
following: 


(a) interchange of two rows of the matrix, 
(b) multiplication of a row by a nonzero number, and 
(c) addition of a multiple of one row to another. 


Elementary column operations are defined analogously. 


Definition 5.5.7 A matrix is in triangular, or row-echelon, form if it satis- 
fies the following three conditions: 


1. Any row consisting of only zeros is below any row that contains at least 
one nonzero element. 

2. Going from left to right, the first nonzero entry of any row is to the left 
of the first nonzero entry of any lower row. 

3. The first nonzero entry of each row is 1. 


Theorem 5.5.8 For any invertible n x n matrix A, the n x 2n matrix (A|1) 
can be transformed into the n x 2n matrix (1|A~!) by means of a finite 


number of elementary row operations.' 


A systematic way of transforming (A|1) into (1|A7!) is first to bring A 
into triangular form and then eliminate all nonzero elements of each column 
by elementary row operations. 


Example 5.5.9 
Let us evaluate the inverse of 


1 2 -1 
A=|0 1 -2 
2 1 -1 
We start with 
1 2 -1}]1 0 0 
0 1 —-2/0 1 OJ=M 
2 1 -1/0 0 1 


and apply elementary row operations to M to bring the left half of it into 
triangular form. If we denote the kth row by (k) and the three operations of 
Definition 5.5.6, respectively, by (k) <= (j), a(k), and a(k) + (j), we get 


i 2 abl at @ 16 
M—->{0 1 -2/0 1 0 
“EWE Nig age Ar hae, Ge. A 


'The matrix (A|1) denotes the n x 2n matrix obtained by juxtaposing the n x n unit 
matrix to the right of A. It can easily be shown that if A, B, and C aren x n matrices, then 
A(B|C) = (AB|AC). 
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1 2 -1} 1 #O O 
—— |0 1 -2;0 1 0 


i 2 
20 bb 2/01. 4 0 |=mM. 
OO” VBS: S375 175 


The left half of M’ is in triangular form. However, we want all entries above 
any | in a column to be zero as well, i.e., we want the left-hand matrix to 
be 1. We can do this by appropriate use of type 3 elementary row operations: 


DO Sih <a ao 
Sao Tt Aeon 0 
Be NG O° “E, \O/S 2395 S15 


10 0|-1/5 -1/5 3/5 
eG A 0) 1 0 
OE NOOO A | Of5. 375) ays 


ra 


1 0 O|-1/5 -1/5 3/5 
ee fe ee ee eee 
PON Ge Oh A | DIS: Ve BySo S15 


The right half of the resulting matrix is A~!. 


Example 5.5.10 It is instructive to start with a matrix that is not invertible 
and show that it is impossible to turn it into 1 by elementary row operations. 
Consider the matrix 


2 -1 3 
B=; 1 -—2 1 
-1 5 O 


Let us systematically bring it into triangular form: 


a=) ee Sn ee tb 20 1G ds SG 
Rss dts Oy Os Peet = ae a eee 
a Se AGO Ay rr? Neg 5: OHO. Or 4 

i 20° hon oP 6 

Seay a 4 280 

AY Nee 5 DOO 8 


1 —2 1}|0 1 O 


LSA: SB NG OH 
+8) \o 3 1/0 1 1 
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0 1 O 
——_> 3 1 -2 0 
-2+8) \g Q 13 ) 

1 —2 1 0 1 0 
> 0 1 1/3)1/3 —2/3 0 
3 \o 0 Of;-1 3 1 
The matrix B is now in triangular form, but its third row contains all zeros. 
There is no way we can bring this into the form of a unit matrix. We therefore 
conclude that B is not invertible. This is, of course, obvious, since it can 
easily be verified that B has a vanishing determinant. 


Rank of a Matrix 

Given any M x N matrix A, an operator T4 € £(Vy, Wy) can be associated 
with A, and one can construct the kernel and the range of T4. The rank of T,4 
is called the rank of A. Since the rank of an operator is basis independent, 
this definition makes sense. 

Now suppose that we choose a basis for the kernel of T4 and extend 
it to a basis of V. Let V; denote the span of the remaining basis vectors. 
Similarly, we choose a basis for T4(V) and extend it to a basis for W. In 
these two bases, the M x N matrix representing T, will have all zeros ex- 
cept for an r x r submatrix, where r is the rank of T4. The reader may 
verify that this submatrix has a nonzero determinant. In fact, the submatrix 
represents the isomorphism between V; and T,(V), and, by its very con- 
struction, is the largest such matrix. Since the determinant of an operator is 
basis-independent, we have the following proposition. 


Proposition 5.5.11 The rank of a matrix is the dimension of the largest 
(square) submatrix whose determinant is not zero. 


5.5.3 Dual Determinant Function 


Let V and V* be N-dimensional dual vector spaces, and let @ : VV x V*" > 
C be a function defined by 


@(\v1), ee ehes lun), Pt, : .-, On) = det(g;(|v;))), OF € v*, |v;) eV. 


(5.30) 
By Theorem 5.5.2, © is a skew-symmetric linear function in |v;),..., |vN) 
as well as in @;,...,@y. Considering the first set of arguments and taking 


a nonzero determinant function A in V, we can write 
O(\v1), sey luv), 1, ..., On) =A -Q061, Qn) 
—$——— $$ 
eC 


by Corollary 2.6.8. We note that Q is a determinant function in V*. Thus, 
again by Corollary 2.6.8, 


Q1,.-.,@v) =B-A*G1,..-, On), 
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for some nonzero determinant function A* in V* and some 6 € C. Combin- 
ing the last two equations, we obtain 


Q(|v1),---, luv), 61,---,6v) = BA(lv1),-.-, luv) A*@1,..-, by). 
(5.31) 


Now let {e}%_, and {|e ayy be dual bases. Then Eq. (5.30) gives 
@(\e1),..., lew), €1,.-.,€n) = det(d;;) = 1, 
and Eq. (5.31) yields 
1=BA(le1),..., |ew))A*(E1,..-,€w). 


This implies that 6 4 0. Multiplying both sides of (5.30) by a = 67! and 
using (5.31), we obtain 


Proposition 5.5.12 For any pair of nonzero determinant functions A and 
A* in V and V*, respectively, there is a nonzero constant a € C such that 


A(\v1),.--, luv) )A*G1, -.., Gv) =adet($i(Iv;))) 


for |vj) € V and $j € V*. 


Definition 5.5.13 Two nonzero determinant function A and A* in V and 


V*, respectively, are called dual if AMA Crmunanat 


functions 


A(|v1),.-.,|vv))A*@1,...,@v) = det(;(lv;))). 


It is clear that if A and A* are any two determinant functions, then A and 
a—!A* are dual. Furthermore, if Aj and A} are dual to A, then A} = A3, 
because they both satisfy the equation of Definition 5.5.13 and A is nonzero. 
We thus have 


Proposition 5.5.14 Every nonzero determinant function in V has a unique 
dual determinant function. 


Here is another way of proving the equality of the determinants of a ma- 
trix and its transpose: 


Proposition 5.5.15 Let T* € End(V*) be the dual of T € End(V). Then 
det T* = det T. In particular, detT' = detT. 


Proof Use Definition 5.5.13 to get 


A(|v1),..., uw) A*(T*o1, ..., Toy) = det(T*¢; (|v;))) 


or 


detT* - A(|v1),...,uw))A* (1, ...,.n) = det(T*9;(|v;))). 
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Furthermore, 


A(Tlv1),-.., Tluv) )A*@1,...,@n) = det(P;(Tlv;))) 
or 


detT- A(|v1),-..,|vv))A*(@1,...,@v) = det($;(Tlv;))). 


Now noting that T*¢; (|v;)) =; (T|v;)), we obtain the equality of the deter- 
minant of T and T*, and by Proposition 5.2.2, the equality of the determinant 
of T and T’. 


5.6 The Trace 


Another intrinsic quantity associated with an operator that is usually defined 
in terms of matrices is given in the following definition. 


Definition 5.6.1 Let A be an N x N matrix. The mapping tr: @“*" > C 
(or R) given by trA= bau —| jj 1s called the trace of A. 


Theorem 5.6.2 The trace is a linear mapping. Furthermore, 
tr(AB) = tr(BA) and trA’ =trA. 


Proof To prove the first identity, we use the definitions of the trace and the 
matrix product: 


tr(AB) = SA) = > si (B) ji = 3 78); (A)ij 


i=1 j=l r1 J=1 


N N 
=> (Soe, vw) = Y\(BA) ji = tr(BA). 


i=l j=l 


The linearity of the trace and the second identity follow directly from the 
definition. 


Example 5.6.3 In this example, we show a very useful connection between 
the trace and the determinant that holds when a matrix is only infinitesimally 
different from the unit matrix. Let us calculate the determinant of 1 + €A to 
first order in €. Using the definition of determinant, we write 


n 
det(1 + €A) = > €i; cin (Oli, + €O1i,)--- (ni, + €Qni, ) 
isssinel 


n 


= y €iy cin Oli, «+ + Onin 


Lisscig=l 


n n 
+e > > €i1...in Li, 24 Skix 8 - Onin Lig « 


k=1 iy,...,in=1 
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The first sum is just the product of all the Kronecker deltas. In the second 
sum, Skiz means that in the product of the deltas, 5;;, is absent. This term is 
obtained by multiplying the second term of the kth parentheses by the first 
term of all the rest. Since we are interested only in the first power of €, we 
stop at this term. Now, the first sum is reduced to €12.., = 1 after all the 
Kronecker deltas are summed over. For the second sum, we get 


n n 
€ > ) €iq cin O1iy «+ + Ski «+» Onin Ukiz 


k=1iq,...,in=1 
n n 
=e ~ >, €12...ig...n@kiz 
k=1ip=l 
n n 
=€ > €12..k...nOkk = € b> On, = €trA, (5.32) 
k=1 =I 


where the last line follows from the fact that the only nonzero value for 
€12...iz..n 18 obtained when i; is equal to the missing index, ie., k, in which 
case it will be 1. Thus det(1 + «A) = 1+ e€trA. 


Similar matrices have the same trace: If A’ = RAR7!, then 
trA’ =tr(RAR_') =tr[R(AR')] = t[(AR7')R] 
= tr[A(R7'R)] = tr(A1) = tra. 
The preceding discussion is summarized in the following proposition. 


Proposition 5.6.4 To every operator A € £(V) are associated two intrinsic 
numbers, det A and trA, which are the determinant and trace of the matrix 
representation of the operator in any basis of V. 


It follows from this proposition that the result of Example 5.6.3 can be 
written in terms of operators: 


det(1 + «A)=1-+etrA. (5.33) 


A particularly useful formula that can be derived from this equation is the 
derivative at t = 0 of an operator A(t) depending on a single variable with 
the property that A(O) = 1. To first order in ¢, we can write A(t) = 1+ tA(0) 
where a dot represents differentiating with respect to t. Substituting this 
in Eq. (5.33) and differentiating with respect to t, we obtain the important 
result 


d : 
~ aet(AW)| = trA@). (5.34) 
dt t=0 


Example 5.6.5 We have seen that the determinant of a product of matrices 
is the product of the determinants. On the other hand, the trace of a sum 
of matrices is the sum of traces. When dealing with numbers, products and 
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sums are related via the logarithm and exponential: a6 = exp{Ina@ + In f}. 
A generalization of this relation exists for diagonalizable matrices, i.e., ma- 
trices which can be transformed into diagonal form by a suitable similarity 
transformation. Let A be such a matrix, i.e., let D = RAR! for some simi- 
larity transformation R and some diagonal matrix D = diag(A,, A2,..., An). 
The determinant of a diagonal matrix is simply the product of its elements: 


detD = AjA2...An.- 


Taking the natural log of both sides and using the result of Example 5.2.6, 
we have 


In(det D) = Ind; + IndAz +---+1ndA, =trdnD), 


which can also be written as det D = exp{tr(InD)}. 

In terms of A, this reads det(RAR~!) = exp{tr(In(RAR7!))}. Now invoke 
the invariance of determinant and trace under similarity transformation and 
the result of Example 5.4.4 to obtain 


det A = exp{tr(R(InA)R_')} =exp{tr(nA)}. (5.35) 


This is an important equation, which is sometimes used to define the deter- 
minant of operators in infinite-dimensional vector spaces. 


Both the determinant and the trace are mappings from M%*" to C. The 
determinant is not a linear mapping, but the trace is; and this opens up the 
possibility of defining an inner product in the vector space of N x N matri- 
ces in terms of the trace: 


Proposition 5.6.6 For any two matrices A,B €¢ M*", the mapping 
een eye 
defined by g(A, B) = tr(A'B) is a sesquilinear inner product. 


Proof The proof follows directly from the linearity of trace and the defini- 
tion of hermitian conjugate. 


Just as determinant of an operator was defined in terms of the operator 
itself (see Definition 2.6.10), the trace of an operator can be defined similarly 
as follows. Let A be a nonzero determinant function in V, and T € £(V). 
Define trT by 


N 


Y-A(la1),.--.Tlai),-.., law) =(trT)- A(lai),-..,law))- 5.36) 
i=l 


Then one can show that trT = trT, for any matrix T representing T in some 
basis of V. The details are left as an exercise for the reader. 
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5.7 Problems 


5.1 Show that if |c) = |a) + |b), then in any basis the components of |c) 
are equal to the sums of the corresponding components of |a) and |b). Also 
show that the elements of the matrix representing the sum of two operators 
are the sums of the elements of the matrices representing those two opera- 
tors. 


5.2 Show that the unit operator 1 is represented by the unit matrix in any 
basis. 


5.3 The linear operator A : R? — R? is given by 


x 
Al y 7. 
2 x+y-Z 


Construct the matrix representing A in the standard bases of IR? and R?. 


5.4 Find the matrix representation of the complex structure J on a real vec- 
tor space V introduced in Sect. 2.4 in the basis 


{le1), |e2),.--5 l@m), Sle1), Jle2), ..., Jlem)}. 
5.5 The linear transformation T : R? > R? is defined as 
T(x], X2,%3) = (41 + x2 — X3, 2x1 — x3, X1 + 2X2). 


Find the matrix representation of T in 


(a) _ the standard basis of R?, 
(b) _ the basis consisting of |a;) = (1, 1,0), Ja2) = 1,0, —1), and |a3) = 
(0, 2, 3). 


5.6 Prove that for Eq. (5.6) to hold, we must have 


M 
(Mi (Bo A)),; = (My (B)),; (mi; (A));; 


i=l 
5.7 Show that the diagonal elements of an antisymmetric matrix are all zero. 


5.8 Show that the number of independent real parameters for an N x N 


(a) (real) symmetric matrix is N(N + 1)/2, 

(b) (real) antisymmetric matrix is N(N — 1)/2, 
(c) (real) orthogonal matrix is N(N — 1)/2, 
(d) (complex) unitary matrix is N - 

(e) (complex) hermitian matrix is N aa 
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5.9 Show that an arbitrary orthogonal 2 x 2 matrix can be written in one of 
the following two forms: 


cos@ —sin@ ae cos@ _sin@ 
sin@  cos@ sind —cosé)° 
The first is a pure rotation (its determinant is +1), and the second has deter- 


minant —1. The form of the choices is dictated by the assumption that the 
first entry of the matrix reduces to 1 when 6 = 0. 


5.10 Derive the formulas 
cos(6; + 02) = cos 6; cos 62 — sin; sin 62, 
sin(6; + 62) = sin 6; cos 2 + cos 4 sin 62 
by noting that the rotation of the angle 6; + 62 in the xy-plane is the product 


of two rotations. (See Problem 5.9.) 


5.11 Prove that if a matrix M satisfies MM’ = 0, then M = O. Note that in 
general, M* = 0 does not imply that M is zero. Find a nonzero 2 x 2 matrix 
whose square is zero. 


5.12 Construct the matrix representations of 
D: P4[t] > Palt] and T: Ps[t] > P%le], 


the derivative and multiplication-by-t operators. Choose {1, t, t7, t?} as your 
basis of PS[r] and {1, , t?, t°, t4} as your basis of P4[t]. Use the matrix of 
D so obtained to find the first, second, third, fourth, and fifth derivatives of 
a general polynomial of degree 4. 


5.13 Find the transformation matrix R that relates the (orthonormal) stan- 
dard basis of C? to the orthonormal basis obtained from the following vec- 
tors via the Gram-Schmidt process: 


1 0 i 
la)=[i], lay=] ly], lay=y] 0 
0 -i -1 


Verify that R is unitary, as expected from Theorem 5.4.2. 


5.14 If the matrix representation of an endomorphism T of C? with respect 


to the standard basis is ( oe what is its matrix representation with respect 


to the basis 1c) (1)}? 


5.15 If the matrix representation of an endomorphism T of C? with respect 
to the standard basis is 


5.7 Problems 


what is the representation of T with respect to the basis 


5.16 Using Theorem 5.5.1, calculate the determinant of a general 3 x 3 
matrix and obtain the familiar expansion of such a determinant in terms of 
the first row of the matrix. 


5.17 Using Theorem 5.5.1, show that if two rows (two columns) of a matrix 
are equal, then its determinant is zero. 


5.18 Show that det(wA) =a detA for an N x N matrix A anda complex 
number q@. 


5.19 Show that det 1 = 1 for any unit matrix. 


5.20 Find a specific pair of matrices A and B such that det(A+ B) 4 detA+ 
det B. Therefore, the determinant is not a linear mapping. Hint: Any pair of 
matrices will most likely work. In fact, the challenge is to find a pair such 
that det(A + B) = detA-+ detB. 


5.21 Let A be any N x N matrix. Replace its ith row (column) with any 
one of its other rows (columns), leaving the latter unchanged. Now expand 
the determinant of the new matrix by its ith row (column) to show that 


N N 
Y (A) ji (CofA) jx = D(A)ij (COFA) =O, Fi. 


j=l j=l 


5.22 Demonstrate the result of Problem 5.21 using an arbitrary 4 x 4 matrix 
and evaluating the sum explicitly. 


5.23 Suppose that A is represented by A in one basis and by A’ in an- 
other, related to the first by a similarity transformation R. Show directly 
that det A’ = det A. 


5.24 Show explicitly that det(AB) = det Adet B for 2 x 2 matrices. 


5.25 Given three N x N matrices A, B, and C such that AB = C with C 
invertible, show that both A and B must be invertible. Thus, any two oper- 
ators A and B on a finite-dimensional vector space satisfying AB = 1 are 
invertible and each is the inverse of the other. Note: This is not true for 
infinite-dimensional vector spaces. 
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5.26 Show directly that the similarity transformation induced by R does not 
change the determinant or the trace of A where 


1 2 -l 3 -l 2 
R={0 1 -—2 and A=|0O 1 -—2 
2 1 -i1 1 =3 =] 


1 xi 
Va Ya y 

1 i —2 

lay=] Fe ]- la)=] ve ]> las)=] Ye 
Iti -1+i iti 

V6 V6 v6 


Show that this matrix is unitary. 


5.28 Consider the three operators L),L2, and L3 satisfying 
[Li, L2] = ibs, [L3, Li] = ib, [L2,L3] =iL1. 
Show that the trace of each of these operators is necessarily zero. 
5.29 Show that in the expansion of the determinant given in Theorem 5.5.1, 
no two elements of the same row or the same column can appear in each 


term of the sum. 


5.30 Find the inverse of the following matrices if they exist: 


3. =] 2 0 1 -l 
A=] 1 O —-3)], B=] 1 2 0], 
—2 1 -!l -1 —2 #1 

10 1 
c=|0 1 
1 0 -1 


5.31 Find inverses for the following matrices using both methods discussed 
in this chapter. 


2 1 -l 12 -l 
A=| 2 1 2], B=|{0 1 -2], 
=A 2 = 2. 2d = 1 
1 -1l 
C=j;-1 1 1], 
1 -1l -2 


1/J2 0 GS=Hfe7e) Way?) 
0 42 dA=-)/OV2) -04+0/eV2) 
ie © <=?) =145/072) 
@ inf -d-p/ev> 040707) 


5.7 Problems 


5.32 Let A be an operator on V. Show that if det A = 0, then there exists a 
nonzero vector |x) € V such that A|x) = 0. 


5.33. For which values of a are the following matrices invertible? Find the 
inverses whenever they exist. 


l1 a O a 1 O 
A=|a 1 a], B={1 a 1], 
0a i il 0 la 
0 la 1 11 
C=]1 a O], D=|1 1 a 
a O 1 lal 


5.34 Let fa;\, , be the set consisting of the N rows of an N x N matrix A 
and assume that the a; are orthogonal to each other. Show that 


|det A] = |lai|| [a2 --- lawl. 


Hint: Consider AA‘. What would the result be if A were a unitary matrix? 


5.35 Prove that a set of n homogeneous linear equations in n unknowns has 
a nontrivial solution if and only if the determinant of the matrix of coeffi- 
cients is zero. 


5.36 Use determinants to show that an antisymmetric matrix whose dimen- 
sion is odd cannot have an inverse. 


5.37 Let V be areal inner product space. Let @:V" x V" — R be a func- 
tion defined by 


@(\v1),...,]uw), lui), ---, luw)) = det((u;lv;)). 


Follow the same procedure as in Sect. 5.5.3 to show that for any determinant 
function A in V there is a nonzero constant a € R such that 


A(lv1),.--, luy))A(|uw1),.--, |un)) = a det((uj|v;)) 
for |u;), |vj) € V. 


5.38 Show that tr(|a)(b|) = (ba). Hint: Evaluate the trace in an orthonor- 
mal basis. 


5.39 Show that if two invertible N x N matrices A and B anticommute (that 
is, AB + BA= 0), then (a) N must be even, and (b) tA=trB=0. 


5.40 Show that for a spatial rotation R, (0) of an angle 6 about an arbitrary 
axis n, tr Rg(0) =1+2cosé. 


5.41 Express the sum of the squares of elements of a matrix as a trace. Show 
that this sum is invariant under an orthogonal transformation of the matrix. 
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5.42 Let S and A be a symmetric and an antisymmetric matrix, respectively, 
and let M be a general matrix. Show that 


(a) trM=trM’, 

(b) tr(SA) = 0; in particular, trA = 0, 

(c) SA is antisymmetric if and only if [S, A] = 0, 
(d) MSM’! is symmetric and MAM’ is antisymmetric, 
(ce) MHM' is hermitian if H is. 


5.43 Find the trace of each of the following linear operators: 
(a) T:R*— R? given by 
Tx, y,2=@+y—z,2x+3y —2z,x—-y). 

(b) T:R*— R? given by 

T(x, y,2)=(¥— 2,4 +2y+2,2—Y). 
(c) T:C*+-— C* given by 

T(x, y,Z,w) = («+iy—z+iw, 2ix+3y—2iz—w,x—iy,z+iw). 

5.44 Use Eq. (5.35) to derive Eq. (5.33). 


5.45 Suppose that there are two operators A and B such that [A, B] = c1, 
where c is a constant. Show that the vector space in which such operators 
are defined cannot be finite-dimensional. Conclude that the position and mo- 
mentum operators of quantum mechanics can be defined only in infinite di- 
mensions. 


5.46 Use Eq. (5.36) to show that trT = trT, for any matrix T representing T 
in some basis of V. 


Spectral Decomposition 


The last chapter discussed matrix representation of operators. It was pointed 
out there that such a representation is basis-dependent. In some bases, the 
operator may “look” quite complicated, while in others it may take a simple 
form. In a “special” basis, the operator may look the simplest: It may be a 
diagonal matrix. This chapter investigates conditions under which a basis 
exists in which the operator is represented by a diagonal matrix. 


6.1 Invariant Subspaces 


We start by recalling the notion of the direct sum of more than two subspaces 
and assume that 


V=UOhe:- OU, =P. (6.1) 
j=l 


Then by Proposition 4.4.1, there exist idempotents {P; Vio such that 


: 
P;P; =6;;P; (no sum) and x P;=1, (6.2) 
j=l 


Definition 6.1.1 Let V be an inner product space. Let M be any subspace 
of V. Denote by M* the set of all vectors in V orthogonal to all the vectors 
in M. M+ (pronounced “em perp”) is called the orthogonal complement 
of M. 


Proposition 6.1.2 M+ is a subspace of V. 


Proof The straightforward proof is left as an exercise for the reader. 


If V of Eq. (6.1) is an inner product space, and the subspaces are mutually 
orthogonal, then for arbitrary |), |v) € V, 


(u|Pj|v) = (ulvj) = (ugly) = (vjlug)” = (v|uj)* = (v|Pj|u)™ 
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which shows that P; is hermitian. 

Consider an orthonormal basis By = {\e;)}/_, for M, and extend it to 
a basis B = {le;)}e , for V. Now construct a (hermitian) projection opera- 
tor P = pee , ei) (ei|. This is the operator that projects an arbitrary vector 
in V onto the subspace M. It is straightforward to show that 1 — P is the 
projection operator that projects onto (+ (see Problem 6.1). 

An arbitrary vector |a) € V can be written as 


ja) =(P+1—P)|a) =Pla)+(1 —P)la). 
SCo"llUl eo” 
in M in M+ 
Furthermore, the only vector that can be in both M and M+ is the zero 
vector, because it is the only vector orthogonal to itself. We thus have 


Proposition 6.1.3 If V is an inner product space, then V=M® Mt for 
any subspace M. Furthermore, the projection operators corresponding to 
M and M+ are hermitian. 


This section explores the possibility of obtaining subspaces by means 
of the action of a linear operator on vectors of an N-dimensional vector 
space V. Let |a) be any vector in V, and A a linear operator on V. The 
vectors 


la), Ala), A*|a),..., A |a) 


are linearly dependent (there are N + 1 of them). Let 1 = Span{A* ie ag 
It follows that, m = dimM < dimV, and M has the property that for any 
vector |x) € M the vector A|x) also belongs to M (show this!). In other 
words, no vector in M “leaves” the subspace when acted on by A. 


Definition 6.1.4 A subspace M is an invariant subspace of the operator 
A if A transforms vectors of M into vectors of M. This is written succinctly 
as A() C M. We say that M reduces A if both 1 and M+ are invariant 
subspaces of A. 


Starting with a basis of M, we can extend it to a basis B = {lai)}_, of V 
whose first m vectors span 1. The matrix representation of A in such a ba- 


sis is given by the relation Ala;) = SS ajjlaj),i=1,2,...,N.Ifi<m, 
then aw ;; = 0 for j > m, because Aja;) belongs to M when i < m and there- 
fore can be written as a linear combination of only {|a1), |a2),..., |am)}. 


Thus, the matrix representation of A in B will have the form 


A A 
a=( 11 12 
O21 Azz 
where Aj; is an m X m matrix, Aj2 an m x (N — m) matrix, 02; the (N — 
m) X m zero matrix, and A227 an (VN — m) x (N — m) matrix. We say that 
Ai, represents the operator A in the m-dimensional subspace M. 


It may also happen that the subspace spanned by the remaining basis 
vectors in B, namely |an+1), |@m+2),---, |an), is also an invariant subspace 
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of A. Then Aj will be zero, and A will take a block diagonal form:! : : 
block diagonal matrix 


ies Ai 0 defined 
~\ 0 Ag)’ 


If a matrix representing an operator can be brought into this form by a 
suitable choice of basis, it is called reducible; otherwise, it is called ir- 


reducible. A reducible matrix A is denoted in two different ways:” reducible anchliteduelbls 


matrices 
Ai 0 
A-{'! & A=Ai@Ad. (6.3) 
0 Ad 

For example, when M reduces A and one chooses a basis the first m vectors 
of which are in M and the remaining ones in M+, then A is reducible. 

We have seen on a number of occasions the significance of the hermitian 
conjugate of an operator (e.g., in relation to hermitian and unitary operators). 
The importance of this operator will be borne out further when we study the 


spectral theorem later in this chapter. Let us now investigate some properties 
of the adjoint of an operator in the context of invariant subspaces. 


Lemma 6.1.5 A subspace M of an inner product space V is invariant under 


: 5 ; bite ‘ condition for invariance 
the linear operator A if and only if M+ is invariant under A‘. 


Proof The proof is left as a problem. 


An immediate consequence of the above lemma and the two identities 
(A‘)' =A and (M+)+ = M is contained in the following theorem. 


Theorem 6.1.6 A subspace of V reduces A if and only if it is invariant 
under both A and A’. 


Lemma 6.1.7 Let M be a subspace of V and P the hermitian projection 
operator onto M. Then M is invariant under the linear operator A if and 
only if AP = PAP. 


Proof Suppose M is invariant. Then for any |x) in V, we have 
Plx)eM = AP|x)eM = PAP|x) = AP|x). 


Since the last equality holds for arbitrary |x), we have AP = PAP. 
Conversely, suppose AP = PAP. For any |y) € M, we have 


Flap =e JAP = Aly P(AP|y)) € M. 
=PAP 


Therefore, VM is invariant under A. 


'From now on, we shall denote all zero matrices by the same symbol regardless of their 
dimensionality. 


7It is common to use a single subscript for submatrices of a block diagonal matrix, just as 
it is common to use a single subscript for entries of a diagonal matrix. 
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Theorem 6.1.8 Let M be a subspace of V, P the hermitian projection op- 
erator of V onto M, and Aa linear operator on V. Then M reduces A if and 
only if A and P commute. 


Proof Suppose M reduces A. Then by Theorem 6.1.6, M is invariant under 
both A and A‘. Lemma 6.1.7 then implies 


AP=PAP and A'P=PA'P. (6.4) 


Taking the adjoint of the second equation yields (A'P)' = (PA'P)", or PA = 
PAP. This equation together with the first equation of (6.4) yields PA = AP. 

Conversely, suppose that PA = AP. Then P?A = PAP, whence PA = 
PAP. Taking adjoints gives A'P = PA'P, because P is hermitian. By 
Lemma 6.1.7, M is invariant under A‘. Similarly, from PA = AP, we get 
PAP = AP”, whence PAP = AP. Once again by Lemma 6.1.7, M is invari- 
ant under A. By Theorem 6.1.6, reduces A. 


6.2 _ Eigenvalues and Eigenvectors 


The main goal of the remaining part of this chapter is to prove that certain 
kinds of operators, for example a hermitian operator, is diagonalizable, that 
is, that we can always find an (orthonormal) basis in which it is represented 
by a diagonal matrix. 

Let us begin by considering eigenvalues and eigenvectors, which are gen- 
eralizations of familiar concepts in two and three dimensions. Consider the 
operation of rotation about the z-axis by an angle @ denoted by R,(@). 
Such a rotation takes any vector (x, y) in the xy-plane to a new vector 
(x cos@ — ysin@,xsin@ + ycos@). Thus, unless (x, y) = (0,0) or @ is an 
integer multiple of 277, the vector will change. Is there a nonzero vector that 
is so special (eigen, in German) that it does not change when acted on by 
R.(6)? As long as we confine ourselves to two dimensions, the answer is no. 
But if we lift ourselves up from the two-dimensional x y-plane, we encounter 
many such vectors, all of which lie along the z-axis. 

The foregoing example can be generalized to any rotation (normally 
specified by Euler angles). In fact, the methods developed in this section 
can be used to show that a general rotation, given by Euler angles, always 
has an unchanged vector lying along the axis around which the rotation takes 
place. This concept is further generalized in the following definition. 


Definition 6.2.1 Let A € End(V) be a linear transformation, and |a) a 
nonzero vector such that 


Ala) =Ala), (6.5) 


with A € C. We then say that |a) is an eigenvector of A with eigenvalue i. 


6.2 Eigenvalues and Eigenvectors 


Proposition 6.2.2 Add the zero vector to the set of all eigenvectors of A 
belonging to the same eigenvalue i, and denote the span of the resulting set 
by M,,. Then M,, is a subspace of V, and every (nonzero) vector in M), is 
an eigenvector of A with eigenvalue i. 


Proof The proof follows immediately from the above definition and the def- 
inition of a subspace. 


Definition 6.2.3 The subspace M, is referred to as the eigenspace of A 
corresponding to the eigenvalue A. Its dimension is called the geometric 
multiplicity of 4. An eigenvalue is called simple if its geometric multiplic- 
ity is 1. The set of eigenvalues of A is called the spectrum of A. 


By their very construction, eigenspaces corresponding to different eigen- 
values have no vectors in common except the zero vector. This can be 
demonstrated by noting that if |v) € 1,9 M, for A ¥ y, then 


0= (A—A1)|v) =Alv) — Alv) = lv) —Alv) =(u—A)|v) = |v) =0. 
#0 


An immediate consequence of this fact is 
My + My _ My ® My 


ifrAA p. 
More generally, 


Proposition 6.2.4 If {A;};_, are distinct eigenvalues of an operator A and 
M; is the eigenspace corresponding to dj, them 


M) +--+ M-p=Mi OOM, = GM. (6.6) 


i=1 


In particular, by Proposition 2.1.15, the eigenvectors of A corresponding to 
distinct eigenvalues are linearly independent. 


Let us rewrite Eq. (6.5) as (A — A1)|a) = 0. This equation says that |a) 
is an eigenvector of A if and only if |a) belongs to the kernel of A — 41. 
If the latter is invertible, then its kernel will consist of only the zero vector, 
which is not acceptable as a solution of Eq. (6.5). Thus, if we are to obtain 
nontrivial solutions, A— 41 must have no inverse. This is true if and only if 


det(A — 41) =0. (6.7) 


The determinant in Eq. (6.7) is a polynomial in A, called the characteris- 
tic polynomial of A. The roots of this polynomial are called characteristic 
roots and are simply the eigenvalues of A. Now, any polynomial of degree 
greater than or equal to | has at least one (complex) root. This yields the 
following theorem. 
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Theorem 6.2.5 Every operator on a finite-dimensional vector space over 
C has at least one eigenvalue and therefore at least one eigenvector. 


Let A}, A2,...,Ap be the distinct roots of the characteristic polynomial of 
A, and let 4; occur mj; times. Then m; is called the algebraic multiplicity 
of A;, and 


Dp 
det(A — A1) = (A, —A)”! - “(Ap Ay"? = [[@, -Ay". (6.8) 
j=l 
For 4 = 0, this gives 
Dp 
detA = aqtny? kp =| [457 (6.9) 
j=l 


Equation (6.9) states that the determinant of an operator is the product of 
all its eigenvalues. In particular, 


Proposition 6.2.6 An operator is invertible iff none of its eigenvalues is 
zero. 


Example 6.2.7 Let us find the eigenvalues of a projection operator P. If 
|a) is an eigenvector, then P|a) = A|a). Applying P on both sides again, we 
obtain 


P?|a) = AP|a) =A(Ala)) =A? Ia). 


But P? = P; thus, P|a) = A7|a). It follows that 7|a) = Ala), or (A? — 
A)|a) = 0. Since |a) 4 0, we must have A(A — 1) = 0, or A = 0, 1. Thus, 
the only eigenvalues of a projection operator are 0 and |. The presence of 
zero as an eigenvalue of P is an indication that P is not invertible. 


Example 6.2.8 To be able to see the difference between algebraic and ge- 
ometric multiplicities, consider the matrix A = ( : oF whose characteristic 
polynomial is (1 — A)*. Thus, the matrix has only one eigenvalue, A = 1, 
with algebraic multiplicity m,; = 2. However, the most general vector |a) 
satisfying (A — 1)|a) = 0 is easily shown to be of the form Cals This shows 


that ,,— is one-dimensional, i.e., the geometric multiplicity of A is 1. 


As mentioned at the beginning of this chapter, it is useful to represent an 
operator by a diagonal matrix. This motivates the following definition: 


Definition 6.2.9 A linear operator A on a vector space V is said to be di- 
agonalizable if there is a basis for V all of whose vectors are eigenvectors 
of A. 


6.3. Upper-Triangular Representations 


Theorem 6.2.10 Let A be a diagonalizable operator on a vector space V 
with distinct eigenvalues {Aj}. Then there are idempotents P; on V such 
that , 


r r 
(1) 1=> Py, (2) P;P;=0 foriF#j, (3) A= > Aj)P) 
j=l j=l 
Proof Let Mj; denote the eigenspace corresponding to the eigenvalue Aj. 
Since the eigenvectors span V, by Proposition 6.2.4 we have 
V=M10M2@:--OM,. 


This immediately gives (1) and (2) if we use Eqs. (6.1) and (6.2) where P ; 
is the projection operator onto Mj. 

To prove (3), let |v) be an arbitrary vector in V. Then |v) can be written 
uniquely as a sum of vectors each coming from one eigenspace. Therefore, 


Alv) = Y > Alv;) = y > ajlv;) = (2.01) lv). 
j=l j=l j=l 


Since this equality holds for all vectors |v), (3) follows. 


6.3. Upper-Triangular Representations 


Let T € End(V) and {lai)}™, a basis of V. Suppose that Span{|a;)}j"_, is 
invariant under T form =1,..., N,i.e., 

T(Span{|a;) }”"_,) © Span{|a;)}", foreachm=1,2,...,N. (6.10) 
Consider the N x N matrix representing T in this basis. Since T\ai) € 
Span{|a1)}, all the elements of the first column of this matrix except pos- 
sibly the first are zero. Since T|az) € Span{|a1), |a2)}, all the elements of 
the second column except possibly the first two are zero. And in general all 
the elements of the ith column except possibly the first i elements are zero. 
Thus the matrix representing T is upper-triangular. 

Expanding the determinant of the upper-triangular matrix above by its 
first column and continuing the same process for the cofactors, we see that 
det T is simply the product of the elements on the main diagonal. Further- 
more, T — 41 is also an upper-triangular matrix whose diagonal elements 
are of the form A; — 4, where 4; are the diagonal elements of T. Hence, 


det(T — A1) = (Ay —A)-+- An —A), 
and we have the following: 


Proposition 6.3.1 The operator T is invertible iff its upper-triangular ma- 
trix representation has no zero on its main diagonal. The entries on the main 
diagonal are simply the eigenvalues of T. 
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As the foregoing discussion shows, upper-triangular representations of an 
operator seem to be convenient. But do they exist? In other words, can we 
find a basis of V in which an operator T is represented by an upper-triangular 
matrix? For the case of a complex vector space the answer is ‘yes,’ as the 
following theorem demonstrates. 


Theorem 6.3.2 Let V be a complex vector space of dimension N and T € 
End(V). Then there exists a basis of V in which T is represented by an upper- 
triangular matrix. 


Proof We prove the theorem by induction on the dimension of subspaces of 
V. For a one-dimensional subspace U, Theorem 6.2.5 guarantees the exis- 
tence of a vector |u)—the eigenvector of T—for which Eq. (6.10) holds. Let 
U = Span{|u)} and write 


V=UeEW, 


which is possible by Proposition 2.1.16. Let Ty and Ty be as in Eq. (4.13). 
Since Tw € End(W) and dim W = N — 1, we can use the induction hypoth- 
esis on Tw and assume that there exists a basis By = lagi’ of W, such 
that 


Twlai) € Span{|a1), |d2),..-, |ai)} foreachi=1,2,...,N—1. 
Now consider the basis By = {|u), |a1),..., |an—1)}. Then 
Tu) =Ty|u) + Twlu) = PuT|u) + PwT|u) 
= Py (Alu)) + Pw(Alu)) = Alu) € Span{|u)} 
Seed eae, 


and 
Tlai) = Tu lai) + Twlai) = Pu (Tlai)) +Twlai) 
— 
eu 
i 
=alu) +) axlax) €Span{|u), |a1),..., |ai)}. 
k=1 


where we used the fact that Tw|a;) € Span{|a,) ae We thus have found a 
basis By for which Eq. (6.10) holds. This completes the proof. 


The ideal goal of the representation of an operator is to have it in diag- 
onal form with its eigenvalues along the diagonal. Theorem 6.3.2 partially 
accomplished this for complex vector spaces: it made the lower half of the 
representing matrix all zeros. In doing so, it used the algebraic closure of C, 
i.e., the fact that any polynomial with coefficients in C has all its roots in C. 
To make the upper half also zero, additional properties will be required for 
the operator, as we’ll see in Sect. 6.4. Thus, for a general operator on a 
complex vector space, upper-triangular representation is the best we can ac- 
complish. The case of the real vector spaces is even more restrictive as we 
shall see in Sect. 6.6. 


6.4 Complex Spectral Decomposition 
6.4 Complex Spectral Decomposition 


This section derives one of the most powerful theorems in the theory of 
linear operators, the spectral decomposition theorem. We shall derive the 
theorem for operators that generalize hermitian and unitary operators. 


Definition 6.4.1 A normal operator is an operator on an inner product 
space that commutes with its adjoint. 


An important consequence of this definition is 


Proposition 6.4.2 The operator A € End(V) satisfies 
JAx|| = |ATx|| forall |x) eV (6.11) 
if and only if A is normal. 


Theorem 6.4.3 Let A be anormal operator on V and Ua subspace of V in- 
variant under A. Then U is invariant under A‘. Therefore by Theorem 6.1.6, 
any invariant subspace of a normal operator reduces it. 


Proof Let {|e;)}/_, be an orthonormal basis of U, and extend it to get 


{le:)}M ,» an orthonormal basis of V. Since U is invariant under A, we can 


write 
m 
Ale;) =) ajile;), ji = (e;|Ale;) 
j=l 


and 


m 


N 
A‘|e)) =) njilej) + 2 Ejile;), 


j=l j=m+1 


where for j = 1,2,...,m, we have 
nji = (ej|A" le;) = (e:|Ale;)* = a7, 


Now note that 


m 
(e;|A*Alei) =)  lous|? 
j=l 


while 
m N m N 
(e|AA|e:) =o Ingl?+ > lb? => ologl?+ D> 161. 
j=! j=m+1 j=! j=m+1 


Since A‘A = AA‘, we must have 


m m N 
2 2 2 
Ye leis! => leal ot >> léij| 


j=l j=l j=m+1 
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or 


N 
Yo gj =0 = &j =0 foralli, j=m+l1,...,N. 
j=m+1 


This implies that A‘ sends every basis vector of U back to U, and therefore 
it does the same for every vector of U. 


Proposition 6.4.4 Let A be anormal operator on V. Then |x) is an eigen- 
vector of A with eigenvalue A if and only if |x) is an eigenvector of A‘ with 
eigenvalue i*. 


Proof By Proposition 6.4.2, the fact that (A — 41)’ = At — A*1, and the 
fact that A — 21 is normal (reader, verify), we have ||(A — 41)x|| = 0 if and 
only if ||(AT — A*1)x|| =0. Since it is only the zero vector that has the zero 
norm, we get 


(A—1)|x)=0 ifandonlyif (A‘—*1)|x) =0. 


This proves the proposition. 


We obtain a useful consequence of this proposition by applying it to a 
hermitian operator H and a unitary operator? U. In the first case, we get 


Hx) =Alx) =H" |x) =Aa*|x) => (A—-a*)ix)=0 > A=d*. 
Therefore, A is real. In the second case, we write 

|x) =1|x) = UU" |x) = U(A*|x)) =MUlx)=A*Alx) BS MAHL. 
Therefore, A is unimodular (has absolute value equal to 1). We summarize 


the foregoing discussion: 


Corollary 6.4.5 The eigenvalues of a hermitian operator are real. 
A unitary operator has eigenvalues whose absolute values are |. 


Example 6.4.6 Let us find the eigenvalues and eigenvectors of the hermi- 
tian matrix H = C 9 )- We have 


det(H — 21) = det & >) ai =i, 


Thus, the eigenvalues, 4; = 1 and Az = —1, are real, as expected. 


3 Obviously, both are normal operators. 
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To find the eigenvectors, we write 


0 =(H—A11)|a1) = (H—1)la1) = (| a (<:) 7 ce ay 


ay 


or a = ia), which gives |a}) = ) = ile le where a; is an arbitrary 


complex number. Also, 


or By = —if,, which gives |a2) = (ig, i= Bi( 1, ), where A; is an arbitrary 
complex number. 
It is desirable, in most situations, to orthonormalize the eigenvectors. In Always normalize the 
the present case, they are already orthogonal. This is a property shared by eigenvectors! 
all eigenvectors of a hermitian (in fact, normal) operator stated in the next 
theorem. We therefore need to merely normalize the eigenvectors: 


; 1 
1=(alai)=a4 (1 iar (;) =2Iai), 


or Jay| = 1/2 and a; = e'?/V/2 for some y € R. A similar result is ob- 
tained for 8,. The choice g = 0 yields 


vindg()) mt -5() 


The following theorem proves for all normal operators the orthogonality 
property of their eigenvectors illustrated in the example above for a simple 
hermitian operator. 


Theorem 6.4.7 An eigenspace of anormal operator reduces that op- 
erator. Moreover, eigenspaces of a normal operator are mutually or- 
thogonal. 


Proof The first part of the theorem is a trivial consequence of Theo- 
rem 6.4.3. To prove the second part, let |u) € and |v) € M,, with A ¢ p. 
Then, using Theorem 6.1.6 once more, we obtain 


A(v|u) = (vlAu) = (v[Au) = (ATv|u) = (w*vlu) = (vl). 


It follows that (A — z)(v|u) = 0 and since A ¥ pL, (vu) = 0. 


Theorem 6.4.8 (Complex Spectral Decomposition) Let A be a nor- 
mal operator on a finite-dimensional complex inner product space V. 
Let 41,2,..., Ay be its distinct eigenvalues. Then 


V=M,O6M26::-OM,, 
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and the nonzero (hermitian) projection operators P,,P2,...,P,., 
where P; projects onto M;, satisfy 


Oy JER Oe So ares, 


(3) Ray lee 
1 


Proof Let P; be the operator that projects onto the eigenspace M; corre- 
sponding to eigenvalue 4;. The fact that at least one such eigenspace exists 
is guaranteed by Theorem 6.2.5. By Proposition 6.1.3, these projection op- 
erators are hermitian. Because of Theorem 6.4.7 [see also Eq. (6.6)], the 
only vector common to any two distinct eigenspaces is the zero vector. So, 
it makes sense to talk about the direct sum of these eigenspaces. Let 


M=M @M2@--- OM, 


and P = )~;_, P;, where P is the orthogonal projection operator onto M. 
Since A commutes with every P; (Theorem 6.1.8), it commutes with P. 
Hence, by Theorem 6.1.8, M reduces A, i.e., M- is also invariant under A. 
Now regard the restriction of A to + as an operator in its own right on the 
finite-dimensional vector space (+. Theorem 6.2.5 now forces A to have at 
least one eigenvector in M+. But this is impossible because all eigenvectors 
of A have been accounted for in its eigenspaces. The only resolution is for 
M* to be zero. This gives 


: 
V=M,;OM26::-@M, and 1=) P: 


The second equation follows from the first and Eqs. (6.1) and (6.2). The 
remaining part of the theorem follows from arguments similar to those used 
in the proof of Theorem 6.2.10. 


We can now establish the connection between the diagonalizability of a 
normal operator and the spectral theorem. In each subspace J;, we choose 
an orthonormal basis. The union of all these bases is clearly a basis for the 
whole space V. Let us label these basis vectors le¥ ), where the subscript 
indicates the subspace and the superscript indicates ‘the particular vector in 


that subspace. Clearly, (eS le*,) = §55/6;;" and Pj = ~ le > ejl- Noting 


that Pile.) = bujles,), we can obtain the matrix elements of A in such a 
basis: , 


(e* Ale’) =>» e*. Ale les) = De dij'(e es let) = = Ajles |e’). 
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Only the diagonal elements are nonzero. We note that for each subscript j 
we have mj; orthonormal vectors lei), where m; is the dimension of Mj. 
Thus, 4; occurs m; times as a diagonal element. Therefore, in such an or- 
thonormal basis, A will be represented by 


diag(A,,. 6.541, 42,262: Ads eee Aes eee Ape 
So SO’ _—_—_—_——_—’ 
m, times mz times my times 


Let us summarize the preceding discussion: 


Corollary 6.4.9 [fA € End(V) is normal, then V has an orthonormal basis 
consisting of eigenvectors of A. Therefore, a normal operator on a complex 
inner product space is diagonalizable. 


Using this corollary, the reader may show the following: 


Corollary 6.4.10 A hermitian operator is positive if and only if all its eigen- 
values are positive. 


In light of Corollary 6.4.9, Theorems 6.2.10 and 6.4.8 are converses of 
one another. In fact, it is straightforward to show that diagonalizability im- 
plies normality. Hence, we have 


Proposition 6.4.11 An operator on a complex inner product space is 
normal iff it is diagonalizable. 


Example 6.4.12 (Computation of largest and smallest eigenvalues) There Computation of the 

is an elegant technique that yields the largest and the smallest (in absolute largest and the smallest 
value) eigenvalues of a normal operator A in a straightforward way if the eigenvalues of a normal 
eigenspaces of these eigenvalues are one dimensional. For convenience, as- operator 

sume that the eigenvalues are labeled in order of decreasing absolute values: 


|Ay| > [Az] > +++ > [Ar] #0. 


Let (laos be a basis of V consisting of eigenvectors of A, and |x) = 
yy &|ax) an arbitrary vector in V. Then 


N N N es m 
A |x) =D) &A™ lax) = Do EA aK) = 27 ay + ya() a 
k=1 k=1 k=2 


In the limit m — oo, the summation in the brackets vanishes. Therefore, 
AM |x) © AvEi|ai) and (y|A™ |x) © ATE (ylai) 


for any |y) € V. Taking the ratio of this equation and the corresponding one 
for m + 1, we obtain 


(y|A"t" |x) 
im ————_ = 
moo (y|A™|x) 
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Note how crucially this relation depends on the fact that 2; is nondegenerate, 
i.e., that ; is one-dimensional. By taking larger and larger values for m, 
we can obtain a better and better approximation to the largest eigenvalue. 
Assuming that zero is not the smallest eigenvalue 1,.—and therefore not 
an eigenvalue—of A, we can find the smallest eigenvalue by replacing A 
with A~! and A, with 1 /,. The details are left as an exercise for the reader. 


Any given hermitian matrix H can be thought of as the representation of a 
hermitian operator in the standard orthonormal basis. We can find a unitary 
matrix U that can transform the standard basis to the orthonormal basis con- 
sisting of ley ), the eigenvectors of the hermitian operator. The representation 


of the hermitian operator in the new basis is UHU", as discussed in Sect. 5.3. 
However, the above argument showed that the new matrix is diagonal. We 
therefore have the following result. 


Corollary 6.4.13 A hermitian matrix can always be brought to diag- 
onal form by means of a unitary transformation matrix. 


Example 6.4.14 Let us consider the diagonalization of the hermitian ma- 


trix 
0 0 -1l+i -l-i 
H= 0 0 —-1+i 1+i 
~|-1-i -1-i 0 0 
-l+i 1-i 0 0 


The characteristic polynomial is det(H—A1) = (A+ 2)7(r _ 2 Thus, A; = 
—2 with multiplicity m,; = 2, and Az = 2 with multiplicity m2 = 2. To find 
the eigenvectors, we first look at the matrix equation (H + 21)|a) = 0, or 


2 O -1+i -1-i\ fu 

0 2  -1+i Iti |Jm|_, 
at<7 <l=f 2 0 a3 
sta; 17 0 2 a4 


This is a system of linear equations whose “solution” is 


1 1 
ag = (1 +t) (a1 + a2), ag = (1 — i)(01 — a2). 
We have two arbitrary parameters, so we expect two linearly independent 
solutions. For the two choices a; = 2, a2 = 0 and a; = 0, a2 = 2, we obtain, 
respectively, 


2 0 
0 2 
|a1) and |a2) = 14a |? 
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which happen to be orthogonal. We simply normalize them to obtain 


2 0 
1 {| 0 1 2 
eS iar To laa) ae 


1-i —-1+i 


Similarly, the second eigenvalue equation, (H — 21)|a) = 0, gives rise 
to the conditions a3 = (1 + i)(a@1 + a2) and a4 = (1 —1)(a@1 — a2), 
which produce the orthonormal vectors 


2 0 

1 0 1 2 

e3) = ——= ; and |e4) = —= ; 
a Dyas im 2/2 t=l=1 
—-l+i 1-i 


The unitary matrix that diagonalizes H can be constructed from these 
column vectors using the remarks before Example 5.4.4, which imply that 
if we simply put the vectors |e;) together as columns, the resulting matrix 
is U': 


2WJ2Ni4e 147 =e? <1=i 


1-i -1l+i -1+i 1-i 


and the unitary matrix will be 


0 2 -1+i I1+i 
We can easily check that U diagonalizes H, i.e., that UHU" is diagonal. 


Example 6.4.15 In some physical applications the ability to diagonalize 
matrices can be very useful. As a simple but illustrative example, let us con- 
sider the motion of a charged particle in a constant magnetic field pointing 
in the z direction. The equation of motion for such a particle is 


as & @ @ 
a, =qvx B=qdet| vy vy 
0 0 


N 


ws 


application of 
diagonalization in 
electromagnetism 
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which in component form becomes 


dv, qB dvy qB dvz 
=—vdy, =———_v,x, ay 


dt m dt m dt 


Ignoring the uniform motion in the z direction, we need to solve the first 
two coupled equations, which in matrix form becomes 


d Uy \ qB 0 1 vy\ 0 i Uy 
at ae Oe 0) (3) =-0(2, ae (6.12) 


where we have introduced a factor of i to render the matrix hermitian, and 
defined w = qB/m. If the 2 x 2 matrix were diagonal, we would get two 
uncoupled equations, which we could solve easily. Diagonalizing the matrix 
involves finding a matrix R such that 


api *\e-1 fer © 
p=A(? eels al 


If we could do such a diagonalization, we would multiply (6.12) by R to 


get* 
. 
sa(i)=—ior (2, j)rta(’). 
dt Vy -—i O Vy 


We then would have a pair of uncoupled equations 


/ / 
a =—-iwpyv. 2h she 
dt a dt 4 
that have v), = Te ae and v\, = Uae ee as a solution set, in which 


! ! : : 
Vox and Voy are integration constants. 


To find R, we need the normalized eigenvectors of Ger .) But these are 
obtained in precisely the same fashion as in Example 6.4.6. There is, how- 
ever, an arbitrariness in the solutions due to the choice in numbering the 
eigenvalues. If we choose the normalized eigenvectors 


==). |e2) = (7). 


4The fact that R is independent of ¢ is crucial in this step. This fact, in turn, is a conse- 
quence of the independence from ¢ of the original 2 x 2 matrix. 
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then from comments at the end of Sect. 5.3, we get 


iene i) = meer'ak(t }) 


With this choice of R, we have 


0 #\.-1 fl 0 
a(S, )e'=(o 41): 


so that 4; = 1 = —j2. Having found R‘, we can write 


Ux \ pt vi. = n. i -i ue 
(‘*) = () > J2 (; 1 ) ( upyem" : (6.13) 


If the x and y components of velocity at t = 0 are vo, and voy, respectively, 
then 


= rd, pr ]= = : : 
Voy Yoy Voy Voy J/2 \ ivox + voy 


Substituting in (6.13), we obtain 


vz\ 1 fi -i\ ((-ivat+ voy)e i! 
vy 2\1 1 (ivox + voy)e' 
__ {Vox COS wt + Uoy Sinwt 
~ \=vox sinwt + voy cos at } ° 


This gives the velocity as a function of time. Antidifferentiating once with 
respect to time yields the position vector. 


6.4.1 Simultaneous Diagonalization 


In many situations of physical interest, it is desirable to know whether two 
operators are simultaneously diagonalizable. For instance, if there exists a 
basis of a Hilbert space of a quantum-mechanical system consisting of si- 
multaneous eigenvectors of two operators, then one can measure those two 
operators at the same time. In particular, they are not restricted by an uncer- 
tainty relation. 


Definition 6.4.16 Two operators are said to be simultaneously diagonal- simultaneous 
izable if they can be written in terms of the same set of projection operators, diagonalization defined 
as in Theorem 6.4.8. 


This definition is consistent with the matrix representation of the two op- 
erators, because if we take the orthonormal basis B = tle’ )} discussed right 
after Theorem 6.4.8, we obtain diagonal matrices for both operators. What 
are the conditions under which two operators can be simultaneously diago- 
nalized? Clearly, a necessary condition is that the two operators commute. 
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This is an immediate consequence of the orthogonality of the projection op- 
erators, which trivially implies P;P ; = P;P; for all i and j. It is also appar- 
ent in the matrix representation of the operators: Any two diagonal matrices 
commute. What about sufficiency? Is the commutativity of the two opera- 
tors sufficient for them to be simultaneously diagonalizable? To answer this 
question, we need the following lemma: 


Lemma 6.4.17 An operator T commutes with a normal operator A if and 
only if T commutes with all the projection operators of A. 


Proof The “if” part is trivial. To prove the “only if” part, suppose AT = TA, 
and let |x) be any vector in one of the eigenspaces of A, say M;. Then 
we have A(T|x)) = T(A|x)) = TA;|x)) =A; (T|x)); ie., Tix) is in M;, or 
M; is invariant under T. Since M; is arbitrary, T leaves all eigenspaces 
invariant. In particular, it leaves MM; the orthogonal complement of M; (the 
direct sum of all the remaining eigenspaces), invariant. By Theorems 6.1.6 
and 6.1.8, TP; =P;T; and this holds for all j. 


Theorem 6.4.18 Two normal operators A and B are simultaneously 
diagonalizable iff [A, B] = 0. 


Proof As claimed above, the “necessity” is trivial. To prove the “suffi- 
ciency”, let 


r Ss 
A= )°AjP; and B=) °paQe, 
j=l a=1 


where {A;} and {P;} are eigenvalues and projections of A, and {j4q} and 
{Q,} are those of B. Assume [A, B] = 0. Then by Lemma 6.4.17, AQ, = 
Q,A. Since Qg commutes with A, it must commute with the latter’s projec- 
tion operators: P ;Qy = QyP;. Now define R jg = P;Qzy, and note that 


Ri, = (PjQy)' =QiP) = Q.Pj =P jQu =Rja. 
(Ria)? = (P;Qu)? = P ;QyP j; Qu = P; Pj; Qu Qu = P ;Qu = R ja- 


Therefore, Rj. are hermitian projection operators. In fact, they are the pro- 
jection operators that project onto the intersection of the eigenspaces of A 
and B. Furthermore, 


r r 
» Riu > POx= Oy, 
j=l j=l 


and similarly, }~* _, Rja =P. Since, 


Yo Ria => Ou =1, 
ja a 
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not all Rjq can be zero. In fact, because of this identity, we must have 
V=DMjNNa 
Ja 


where Mj; and Ng are the eigenspaces of A and B, respectively. We can now 
write A and B as 


A=)0AjPj) =) AjRja, B= >) in Qe= > te Rye. 
J joa om jo 


By definition, they are simultaneously diagonalizable. 


Example 6.4.19 Let us find the spectral decomposition of the Pauli spin spectral decomposition 
matrix of a Pauli spin matrix 


fo <i 
=|, 9): 


The eigenvalues and eigenvectors have been found in Example 6.4.6. These 
are 


I 1 1 1 
M=1, j= (3) and A,=-l, a= (2,). 


The subspaces M,, ; are one-dimensional; therefore, 


Pi =lei)(el= = (7) s(t =3(; Fae 


1 1 
Pr =lea)eal = 5 (.,) (i )=5 


We can check that Py + P2 = ¢ “a and 


Ct a) 070. Py fh 
maPi baa o= 5 (; eG = ( 9 =e 


Example 6.4.20 In this example, we provide another proof that if T is diag- 
onalizable, then it must be normal. We saw in Chap. 4 that T can be written 
in terms of its so-called Cartesian components as T = X + iY where both X 
and Y are hermitian and can therefore be decomposed according to Theo- 
rem 6.4.8. Can we conclude that T is also decomposable? No. Because the 
projection operators used in the decomposition of X may not be the same 
as those used for Y. However, if X and Y are simultaneously diagonalizable 
such that 


a r 
X=) AgPE and ¥= > APs (6.14) 
k=1 k=1 


5Note that X and Y may not have equal number of projection operators. Therefore one of 
the sums may contain zeros as part of their summands. 
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then T= )0,_ (Ag + i4),)Px. It follows that T has a spectral decomposition, 
and therefore is diagonalizable. Theorem 6.4.18 now implies that X and Y 
must commute. Since, X = 5(T + T') and Y = on (T- T'), we have [X, Y] = 
0 if and only if [T, Ti] = 0; 1.e., T is normal. 


6.5 _ Functions of Operators 

Functions of transformations were discussed in Chap. 4. With the power of 
spectral decomposition at our disposal, we can draw many important con- 
clusions about them. 


First, we note that if T= Mie , AiP;, then, because of orthogonality of 
the P;’s 


r r r 
Tro) Ae Foy ee ay PSP: 
i=l i=1 i=l 
Thus, any polynomial p in T has a spectral decomposition given by p(T) = 
>?) p(i)P;. Generalizing this to functions expandable in power series 


gives 


fT) =o ADP. (6.15) 


i=l 


Example 6.5.1 Let us investigate the spectral decomposition of the follow- 
ing unitary (actually orthogonal) matrix: 


U= cos@ —sin@ 
~ \sin@  cosé }° 
We find the eigenvalues 
) <1 = 200804 1=0, 


yielding A; = e'? and Ax = e!*. For A, we have (reader, provide the miss- 


ing steps) 
cos@ — e!? — sind Qty if 
sind cos@ —e!? } \ay) 


1/f 
=> a=iay => |e)= |, 


cos@ — e/? —sindg ay\ 0 
sin 0 cos@ — e/? a) 


; 1 1 
=> a=-ia, => |e)=— ae 


al 


and for A2, 


g 
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We note that the , , are one-dimensional and spanned by |e;). Thus, 


Clearly, Pj + P2 = 1, and 


4 . 1 fei? =_je-i? 1 elf ie? 
e B+ cP) (, -i9@ ,-i9 JtZ\_,,i0 i ) =U- 


ie 
If we take the natural log of this equation and use Eq. (6.15), we obtain 


InU = In(e~”)P; + In(e"”) Pp = —iOP) + iOP2 
= i(—OP; + OP2) =iH, (6.16) 


where H = —OP; + OP? is a hermitian operator because 6 is real and P; and 
P» are hermitian. Inverting Eq. (6.16) gives U = e'4, where 


H=o(-Pi +P) =0(° at 

-i O 

Using this matrix in the power series expansion of the exponential, the 
reader is urged to verify directly that U = e!4. 


The example above shows that the unitary 2 x 2 matrix U can be written 
as an exponential of an anti-hermitian operator. This is a general result. In 
fact, we have the following theorem, whose proof is left as an exercise for 
the reader (see Problem 6.23). 


Theorem 6.5.2 A unitary operator U on a finite-dimensional com- 
plex inner product space can be written as U = e' where H is hermi- 
tian. Furthermore, a unitary matrix can be brought to diagonal form 
by a unitary transformation matrix. 


The last statement follows from Corollary 6.4.13 and the fact that 
f(RHR~') =Rf(H)R"! 


for any function f that can be expanded in a Taylor series. 

A useful function of an operator is its square root. A natural way to de- 
fine the square root of a normal operator A is /A = > 7-1 /A)P;. This 
clearly gives many candidates (2”, to be exact) for the root. 


The square root of a 
normal operator is 
plagued by 
multivaluedness. In the 
real numbers, we have 
only two-valuedness! 


Definition 6.5.3 The positive square root of a positive (thus hermitian, 
thus normal) operator A= )~)_, A:P; is VA = S7)_, VAiPi. 
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The uniqueness of the spectral decomposition implies that the positive 
square root of a positive operator is unique. 


Example 6.5.4 Let us evaluate VA where 
5 3i 
A= : 
(4, 5) 
First, we have to spectrally decompose A. Its characteristic equation is 
a? — 10+ 16=0, 
with roots A; = 8 and Az = 2. Since both eigenvalues are positive and A is 


hermitian, we conclude that A is indeed positive (Corollary 6.4.10). We can 
also easily find its normalized eigenvectors: 


mooi) mw 5G) 


1/1 i 1/1 -i 
Pi=lentel=5( D, P2=leadeal= 5 (; ae 


VA = JSA,P1 + VA2P2 


53 (4 A) =a 4) 


We can easily check that (/A)* = A. 


and 


Intuitively, higher and higher powers of T, when acting on a few vectors 
of the space, eventually exhaust all vectors, and further increase in power 
will be a repetition of lower powers. This intuitive idea can be made more 
precise by looking at the projection operators. We have already seen that 


: 
TPS) VP, welts 
j=l 


For various n’s one can “solve” for P; in terms of powers of T. Since there 
are only a finite number of P;’s, only a finite number of powers of T will 
suffice. In fact, we can explicitly construct the polynomial in T for P;. If 
there is such a polynomial, by Eq. (6.15) it must satisfy 


: 
Pj = pj(T) = >> pj(Ax)Px, 
k=1 


where pj; is some polynomial to be determined. By orthogonality of the 
projection operators, p;(A,) must be zero unless k = j, in which case it 
must be 1. In other words, pj (Ax) = 5,;. Such a polynomial can be explicitly 
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constructed: 
x—hy x—ho x—-i é x— AE 
rion (S222) (S289 “fH 
Aja-At Aj —ho Aj — dr ij Aj — Ak 
Therefore, 
; 
T—rAx1 
Pj=p)M=]]- 7 (6.17) 
AG Ak 
kAj 


and we have the following result. 


Proposition 6.5.5 Let V be a finite-dimensional vector space and T € 
End(V) a normal operator. Then 


© r r T=>2 1 
FM=)FaNPj=) 7 FANT] =" (6.18) 
J 


j=l j=l kAj 


i.e., every function of T is a polynomial. 


Example 6.5.6 Let us write V/A of the last example as a polynomial in A. 
We have 


: 
A-—Ax1 A-— dot 

A)= A-—2 

pi(A) res se a ), 


: 
A—Agt1 = =A—Ay1 i 
pr(A) Neeser ray oan 


Substituting in Eq. (6.18), we obtain 


8 
Ji= Jip) +Viamita) = Sea 2) Tn 8) = see +B, 


The RHS is clearly a (first-degree) polynomial in A, and it is easy to verify 
that it is the matrix of \/A obtained in the previous example. 


6.6 Real Spectral Decomposition 


The treatment so far in this chapter has focused on complex inner product 
spaces. The complex number system is “more complete” than the real num- 
bers. For example, in preparation for the proof of the spectral decomposition 
theorem, we used the existence of roots of a polynomial over the complex 
field (this is the fundamental theorem of algebra). A polynomial over the 
reals, on the other hand, does not necessarily have all its roots in the real 
number system. Since the existence of roots was necessary for the proof 
of Theorem 6.3.2, real operators cannot, in general, even be represented by 
upper-triangular matrices. It may therefore seem that vector spaces over the 
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reals will not satisfy the useful theorems and results developed for complex 
spaces. However, as we shall see in this section, some of the useful results 
carry over to the real case. 


Theorem 6.6.1 An operator on a real vector space has invariant subspaces 
of dimension | or 2. 


Proof Let V be a real vector space of dimension N and T € £(V). Take a 
nonzero vector |v) € V and consider the N + 1 vectors Uigky ae These 
vectors are linearly dependent. Hence, there exist a set of real numbers 
{nic}p_o> not all equal to zero, such that 


nolv) + mTlv) +--+ yvT |v) =|0) or p(T)|v)=|0), 6.19) 


where p(T) = ee ngT* is a polynomial in T. By Theorem 3.6.5, we have 


r R 
p@)=y] [a—ait* [] (7? +07 + 6)1)", (6.20) 
i=l j=) 


for some nonzero constant y.° If all the factors in the two products are in- 
jective, then they are all invertible (why?). It follows that p(T) is invertible, 
and Eq. (6.19) yields |v) = |0), which contradicts our assumption. Hence, at 
least one of the factors in the product is not injective, i.e., its kernel contains 
a nonzero vector. If this factor is one of the terms in the first product, say 
T —Am1, and |v) ¥ |0) is in its kernel, then 


(T-—AmD|u) =|0) or Tu) =Am|x), 


and Span{|z)} is a one-dimensional invariant subspace. 
Now suppose that the non-injective factor is in the second product of 
Eq. (6.20), say T? + aT + Bn1 and |v) ¢ |0) is in its kernel, then 


(T* +anT + Bn1)|v) = |0). 


It is straightforward to show that Span{|v), T|v)} is an invariant subspace, 
whose dimension is | if |v) happens to be an eigenvector of T, and 2 if 
not. 


Example 6.6.2 Consider the operator T : R* > R? given by 


*(a)= (<2) 
a2 —a 
Suppose |x) € R? is an eigenvector of T. Then 


Tix) =Alx) => T?|x) =AT\|x) =A? Ix). 


But T? = —1 , as can be easily verified. Therefore, w= —1, and T has no 
real eigenvalue. It follows that T has no eigenvectors in R?. 


We are not assuming that nv #0. 
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The preceding example showed that there exist operators on R? which 
have no eigenvectors. The fact that the dimension of the vector space was 
even played an important role in the absence of the eigenvectors. This is 
not generally true for odd-dimensional vector spaces. In fact, we have the 
following: 


Theorem 6.6.3 Every operator on an odd-dimensional real vector space 
has a real eigenvalue and an associated eigenvector. 


Proof Let V be a real vector space of odd dimension N and T € L(V). 
We prove the theorem by induction on N. Obviously, the theorem holds 
for N = 1. If T has no eigenvalue, then by Theorem 6.6.1, there is a two- 
dimensional invariant subspace U. Write 


V=UOW, 


where W has odd dimension N — 2. With Ty and Ty as in Eq. (4.13), and 
the fact that Tw € £(W), we can assume that the induction hypothesis holds 
for Ty, i.e., that it has a real eigenvalue A and an eigenvector |w) in W. 

Now consider the 3-dimensional subspace V3 of V and an operator T, 
defined by 


V3=U@ Span{|w)}, and T,=T-At, 


respectively, and note that T, UC U because U is invariant under T. Further- 
more, 
T,|w) =T|w) — Alw) = Ty|w) + Tw|w) — Alw) 
———— 
=|0) 
=Ty|w) =Pu(Ty|w)) € U. 
Thus, T, : V3 — U. Invoking the dimension theorem, we see that kerT, has 


dimension at least one. Thus, there is |v3) € V3 such that 


T,,|u3) = (T — A1)|v3) = |0), 


ie., that T has a real eigenvalue and a corresponding eigenvector. 


6.6.1 The Case of Symmetric Operators 


The existence of at least one eigenvalue was crucial in proving the complex 
spectral theorem. A normal operator on a real vector space does not have a 
real eigenvalue in general. However, if the operator is self-adjoint (hermi- 
tian, symmetric), then it will have a real eigenvalue. To establish this, we 
start with the following 


Lemma 6.6.4 Let T be a self-adjoint (hermitian) operator on a vector 
space V. Then 


H=T>+aT+ 1, «a, BER, a* <4, 


is invertible. 
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Proof By Theorem 4.3.10, it is sufficient to prove that H is strictly positive. 
Factor out the polynomial into its linear factors and note that, since a and B 
are real, the two roots are complex conjugate of one another. Furthermore, 
since a? < 48, the imaginary parts of the roots are not zero. Let 4 be one of 
the roots and let S = T — 41. Since T is self-adjoint, H = sis. Therefore, 


(a|H|a) = (a|S‘S|a) = (Sa|Sa) > 0. 
The case of 0 is excluded because it corresponds to 
S|a)=|0) or (T—A1)\a) =|0), 


implying that |a) is an eigenvector of T with a non-real eigenvalue. This 
contradicts Theorem 4.3.7. Therefore, (a|H|a) > 0. 


Note that the lemma holds for complex as well as real vector spaces. Prob- 
lem 6.24 shows how to prove the lemma without resort to complex roots. 


Proposition 6.6.5 A self-adjoint (symmetric) real operator has a real 
eigenvalue. 


Proof As in the proof of Theorem 6.6.1, we have a nonzero vector |v) and 
a polynomial p(T) such that p(T)|v) = |0), Le., 


r R 
[ [a ain” [] P+ @jT + 6)1)“/v) = 10), 


i=l i=) 


with A;,0;, Bj € R and os < 46;. By Lemma 6.6.4, all the quadratic factors 
are invertible. Multiplying by their inverses, we get 


| [or a0 1)* Iv) = 10). 
i=1 


At least one of these factors, say i = m, must be non-injective (why?). 
Hence, 


(T= Am1)*" |v) = 10). 
If |a) = (T —A,,1)*—!]v) # |0), then |a) is an eigenvector of T with real 
eigenvalue A,,.. Otherwise, we have 


(T — Am 1)*"—" |v) = |0). 


If |b) = (T — Am 1)'"-?|v) ¥ |0), then |b) is an eigenvector of T with real 
eigenvalue i,,. It is clear that this process has to stop at some point. It fol- 
lows that there exists a nonzero vector |c) such that (T — A,,1)|c) = |0). 


Now that we have established the existence of at least one real eigenvalue 
for a self-adjoint real operator, we can follow the same steps taken in the 
proof of Theorem 6.4.8 and prove the following: 
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Theorem 6.6.6 Let V be a real inner product space and T a self- 
adjoint operator on V. Then there exists an orthonormal basis in V 
with respect to which T is represented by a diagonal matrix. 


This theorem is especially useful in applications of classical physics, 
which deal mostly with real vector spaces. A typical situation involves a 
vector that is related to another vector by a symmetric matrix. It is then 
convenient to find a coordinate system in which the two vectors are related 
in a simple manner. This involves diagonalizing the symmetric matrix by a 
rotation (a real orthogonal matrix). Theorem 6.6.6 reassures us that such a 
diagonalization is possible. 


Example 6.6.7 For a system of N point particles constituting a rigid body, 
the total angular momentum L = se mj(rj X v;) is related to the angular 
frequency via 


N N 


L=)omi{[r; x (w@ x ri) =) ¢ mor; “Ti —7r, (rj -@)], 


i=l 


or 
Ly Tex Tey Tez Wx 
LyJ= [he Hy Lz y], 
L, Tex Igy Lz Wz 
where 
N N 
2 2 2 2 
Try = mi (r} — x), hy =) mi(r; — yr), 
i=l i=l 
N N 
2 2 
l=) tar —z;), Ty =— Ymixiyi. 
i=l i=l 
N 


N 
ly, =— > miyizi, 


i=1 


Iez= ) Mj Xj Zi, 


i=1 


with Ivy = Tyx, Lez = Tex, and Iyz = Izy. 

The 3 x 3 matrix is denoted by | and is called the moment of inertia 
matrix. It is symmetric, and Theorem 6.6.6 permits its diagonalization by an 
orthogonal transformation (the counterpart of a unitary transformation in a 
real vector space). But an orthogonal transformation in three dimensions is 
merely a rotation of coordinates.’ Thus, Theorem 6.6.6 says that it is always 
possible to choose coordinate systems in which the moment of inertia matrix 
is diagonal. In such a coordinate system we have Ly = Iyx@,, Ly = Iyy@y, 
and L, = I,,@,, simplifying the equations considerably. 


7This is not entirely true! There are orthogonal transformations that are composed of a 
rotation followed by a reflection about the origin. See Example 5.5.3. 
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Similarly, the kinetic energy of the rigid rotating body, 


=f aoa 

ap rs ani a? aes . (@ x rj) 

=n Mj: (Fj ) ne ! a 
= x — =-w 

a) eee 2° 


which in general has off-diagonal terms involving J, and so forth, reduces 
to a simple form: T = 51x. + 5 lyyor + 51.02. 


Example 6.6.8 Another application of Theorem 6.6.6 is in the study of 
conic sections. The most general form of the equation of a conic section 
is 

ayx + any” + a3xy + a4x + asy +45 =0, 


where a1, ..., a6 are constants. If the coordinate axes coincide with the prin- 
cipal axes of the conic section, the xy term will be absent, and the equation 
of the conic section takes the familiar form. On geometrical grounds we 
have to be able to rotate x y-coordinates to coincide with the principal axes. 
We shall do this using the ideas discussed in this chapter. 

First, we note that the general equation for a conic section can be written 
in matrix form as 


olde Z)C)re ai()ra-o 


The 2 x 2 matrix is symmetric and can therefore be diagonalized by means 
of an orthogonal matrix R. Then R‘R = 1, and we can write 


to{ 41 93/2\ ot (* io {x _ 
(x yra( af?) min (*) + (a as) RIR(*) +a6=0. 


Let 
eV - fx a a3/2\_, fa, 0 
aG)=G) Use “a= (0 a): 
Rn (24) — a), 
as as 
Then we get 
a, 0) (x’ ‘ 
(A(G )U)oe (fom 
or 


ax? + ayy” + a,x’ + asy’ + a6 =0. 
The cross term has disappeared. The orthogonal matrix R is simply a rota- 
tion. In fact, it rotates the original coordinate system to coincide with the 
principal axes of the conic section. 
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Example 6.6.9 In this example we investigate conditions under which a 
multivariable function has a maximum or a minimum. 
A point a = (a), a2,...,d,)) € R” is a maximum (minimum) of a func- 
tion 
f (41, X2,-..,Xn) = f(r) 
if 


V flx=a; = ( oD. of of ) =0. 


Ox1° 0x2” OXp 
For small x; — a;, the difference f(r) — f(a) is negative (positive). To relate 


this difference to the topics of this section, write the Taylor expansion of the 
function around a keeping terms up to the second order: 


” a 
f@) = fa)+ ) ai - an( 3) 
i=l ee 


Hy eaneaei( ) 2 
ey eae eels © Ty 
i,j 


or, constructing a column vector out of 6; = x; — a; and a symmetric matrix 
Dj; out of the second derivatives, we can write 


ea pes FOF O08 ex 


because the first derivatives vanish. For a to be a minimum point of /, the 
RHS of the last equation must be positive for arbitrary 5. This means that D 
must be a positive matrix.® Thus, all its eigenvalues must be positive (Corol- 
lary 6.4.10). Similarly, we can show that for a to be a maximum point of f, 
—D must be positive definite. This means that D must have negative eigen- 
values. 

When we specialize the foregoing discussion to two dimensions, we ob- 
tain results that are familiar from calculus. For the function f(x, y) to have 
a minimum, the eigenvalues of the matrix 


) 
fyx fry 


must be positive. The characteristic polynomial 


extrema of a 
multivariable function 


mA xy 
aet(/ a. » \e0 => = (Fert fy i+ far fry — 12,50 


yields two eigenvalues: 


7 Sicx Se hoy a (fix — tar +4f2, 


; 
: 2 


8Note that D is already symmetric—the real analogue of hermitian. 
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fex + fry — af fax — fry)? +492, 
= 5) . 


A2 


These eigenvalues will be both positive if 


fax + fyy > (Ge = fyy)? +a fe. 


and both negative if 


Fax + Syy <= =) ex = ta at 4f2,. 
Squaring these inequalities and simplifying yields 


2 
tex fyy > ta 


which shows that f;, and fyy must have the same sign. If they are both 
positive (negative), we have a minimum (maximum). This is the familiar 
condition for the attainment of extrema by a function of two variables. 


6.6.2. The Case of Real Normal Operators 


The establishment of spectral decomposition for symmetric (self-adjoint) 
operators and its diagonalization was fairly straightforward, requiring only 
the assurance that the operator had a real eigenvalue, i.e., a one-dimensional 
invariant subspace. The general case of a normal operator does not embody 
this assurance. Hence, we do not expect a full diagonalization. Nevertheless, 
we can explore the minimal invariant subspaces of a normal operator on a 
real vector space. 

Let’s start with Theorem 6.6.1 and first note that the one dimensional in- 
variant subspaces of an operator T consist of vectors belonging to the kernel 
of a polynomial of first degree in T; i.e., these subspaces consist of vectors 
|u) such that 


pi (T)|u) = (T — A1)|u) = |0). (6.21) 


Since TT’ = T'T, a subspace labeled by A is invariant under both T and T*. 
Next we note that the same applies to two-dimensional case. The vectors 
|v) in the two-dimensional invariant subspaces satisfy 


Pa,p(T)|v) = (T? + aT + B1)|v) = |0). (6.22) 


Again because of the commutativity of T and T’, if |v) is in a subspace, so 
is T‘|v), and the subspace is invariant under both T and T*. 

Denote the subspace consisting of all vectors |u) satisfying Eq. (6.21) 
by M,, and the subspace consisting of all vectors |v) satisfying Eq. (6.22) 
by Mu,g. We have already seen that 0,9 My = {|0)} if A 4 A’. We further 
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assume that there is no overlap between M,,s and My gs, i.e., the latter con- 
tain no eigenvectors. Now we show the same for two different Mg gs. Let 
|v) € Ma,p A Mg’ g’. Then 


(T? + aT + B1)|v) = |0) 
(T* + a/T + B’1)|v) = |0). 


Subtract the two equations to get 
[(a —a’)T + (B — B’)1]|v) = |0). 


If « 4a’, then dividing by a — a’ leads to an eigenvalue equation implying 
that |v) must belong to one of the ),s, which is a contradiction. Therefore, 
a =a’, andif B $ Pf’, then |v) = |0). 

Now consider the subspace 


(%)-(%.) 
i=l j=! 


where {Ai} 1 exhausts all the distinct eigenvalues and {(a;, B p= 1 eX- 


hausts all the distinct pairs corresponding to Eq. (6.22). Both T and T* leave 
M invariant. Therefore, (+ is also invariant under T. If + ¥ {|0)}, then 
it can be considered as a vector space on its own, and T can find either a 
one-dimensional or a two-dimensional invariant subspace. This contradicts 
the assumption that both of these are accounted for in the direct sums above. 
Hence, we have 


Theorem 6.6.10 Let V be a real vector space and T a normal operator 
on V. Let {A;}i_, be complete set of the distinct eigenvalues of T and 
{(q;, BY a1 all the distinct pairs labeling the second degree polynomials 
of Eq. (6.22). Let M,, = ker p,, (1) and Me j.B; = ker py,,p;(T) as in (6.21) 
and (6.22). Then 


v= (@2.) ® (B%.) 
i=1 j=l 


where ij, 0;, 8; € R and as < 4B;. 


We now seek bases of V with respect to which T has as simple a repre- 


sentation as possible. Let m; denote the dimension of ,,, and aoe, a 


basis of J;,,. To construct a basis for Me ;,B;> let prsPi) be a vector lin- 
early independent from la\ ) for all i and k. Let os Bj = To Bij , and 
note that pw °Bi and os "Bij . are linearly independent from each other 
and all the la\”)s (why?). Pick os! Bij ’y to be linearly independent from all 


the previously constructed vectors and let Di Bj = Tbs ‘Bi H Continue 
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this process until a basis for My,,g, 1s constructed. Do this for all j. If we 
denote the dimension of Mg,,p; by nj, then 


Bye (Ute?) U (Cigar 


i=l j=l 


is a basis for V. 
How does the matrix Mr of T look like in this basis? We leave it to the 
reader to verify that 


Mr = diag(A1 1, Siasiacten' Arn,» May ,61 ares Ma,.B;)> (6.23) 
where diag means a block diagonal matrix, 1; is ak x k identity matrix, and 


—Bj 


0 
Ma;,6; = diag(J1,...,dn;), x= (j aj 


\ Rie Tocceetg (6.24) 


In other words, Mr has the eigenvalues of T on the main diagonal up to 
m,+---+m, and then 2 x 2 matrices similar to Jz, (possibly with different 
a; and £;) for the rest of the diagonal positions. 

Consider any eigenvector |x,) of T (if it exists). Obviously, Span{|x,)} is 
a subspace of V invariant under T. By Theorem 6.4.3, Span{|x1)} reduces T. 
Thus, we can write 


V= Span{|x1)} @ Span{|x1)}". 


Now pick a new eigenvector |x) in Span{|x;)}+ (if it exists) and write 


V= Span{|x1)} @ Span{|x2)} @ Span{|x2) i 


Continue this until all the eigenvectors are exhausted (there may be none). 
Then, we have 


v= (e spats ® (e Sani} 
= (é spt Ow. 


Since a real vector space has minimal invariant subspaces of dimensions 
one and two, W contains only two-dimensional subspaces (if any). Let |y1) 
be a nonzero vector in W. Then there is a second degree polynomial of 
the type given in Eq. (6.22) whose kernel is the two-dimensional subspace 
Span{|y1), T|y1)} of W. This subspace is invariant under T, and by Theo- 
rem 6.4.3, it reduces T in W. Thus, 


W = Span{|y1), Tlyi)} ® Span{|y1), Thy) }- 


6.6 Real Spectral Decomposition 201 


Continuing this process and noting that W does not contain any one- 
dimensional invariant subspace, we obtain 


K 


W= @ Span{\y;). Tly;)} 


j=l 


and hence, 


Real Spectral 


Theorem 6.6.11 (Real Spectral Decomposition) Let V be a real vec- Decomposition Theorem 


tor space and T a normal operator on V. Let \x;) and |y;) satisfy 
Eqs. (6.21) and (6.22), respectively. Then, 


V= (@ spn( ® (@soetivp tv) (6.25) 


fel 


with dimV =2K + M. 


We have thus written V as a direct sum of one-and two-dimensional sub- 
spaces. Either K (e.g., in the case of a real self-adjoint operator) or M (e.g., 
in the case of the operator of Example 6.5.1) could be zero. 

An important application of Theorem 6.6.11 is the spectral decompo- 
sition of an orthogonal (isometric) operator. This operator has the property 
that OO‘ = 1. Taking the determinants of both sides, we obtain (detO)* = 1. 
Using Theorem 6.6.11 (or 6.6.10), we see that the representation of O con- 
sists of some 1 x 1 and some 2 x 2 matrices placed along the diagonal. 
Furthermore, these matrices are orthogonal (why?). Since the eigenvalues 
of an orthogonal operator have absolute value | (this is the real version of 
the second part of Corollary 6.4.5), a 1 x 1 orthogonal matrix can be only 
+1. An orthogonal 2 x 2 matrix is of the forms given in Problem 5.9, i.e., 


cosé; —sind; cos6; sin 0; 
J J r j j 
sin 0; cos 6; sind; —cos6; 


Po(0)) = ( ) 626) 
in which the first has a determinant +1 and the second —1. We thus have 
the following: 


Theorem 6.6.12 A real orthogonal operator on a real inner product space 
V cannot, in general, be completely diagonalized. The closest it can get to a 
diagonal form is 


Odiag = diag(1,1,...,1,-1,-1,..., -1, Ro(@1), Ro(@),..., Ro(@m)), 
Ny N- 


where N, + N_ + 2m =dimV and R2(6;) is as given in (6.26). Further- 
more, the matrix that transforms an orthogonal matrix into the form above 
is itself an orthogonal matrix. 
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The last statement follows from Theorem 6.5.2 and the fact that an orthog- 
onal matrix is the real analogue of a unitary matrix. 


Example 6.6.13 In this example, we illustrate an intuitive (and non- 
rigorous) “proof” of the diagonalization of an orthonormal operator, which 
in some sense involves the complexification of a real vector space. 

Think of the orthogonal operator O as a unitary operator.’ Since the abso- 
lute value of the eigenvalues of a unitary operator is 1, the only real possibil- 
ities are +1. To find the other eigenvalues we note that as a unitary operator, 
O can be written as e4, where A is anti-hermitian (see Problem 6.23). Since 
hermitian conjugation and transposition coincide for real vector spaces, we 
conclude that A = —A’, and A is antisymmetric. It is also real, because O is. 

Let us now consider the eigenvalues of A. If A is an eigenvalue of A corre- 
sponding to the eigenvector |a), then (a|A|a) = A(a|a). Taking the complex 
conjugate of both sides gives (a|A‘|a) = A* (ala); but A' = A’ = —A, be- 
cause A is real and antisymmetric. We therefore have (a|A|a) = —A* (ala), 
which gives 4* = —A. It follows that if we restrict A to be real, then it 
can only be zero; otherwise, it must be purely imaginary. Furthermore, the 
reader may verify that if A is an eigenvalue of A, so is —A. Therefore, the 
diagonal form of A looks like this: 


Adiag = diag(0, 0, ..., 0,101, —10, 102, —i02,..., 10%, —i@k), 
which gives O the following diagonal form: 
Odgiag = eMdiag — diag(e°, e°, ee gi, elt : gm : ef, ge. sac elk e #6) 


with 6), 62,..., 0, all real. It is clear that if O has —1 as an eigenvalue, then 
some of the 6’s must equal tz. Separating the z’s from the rest of 0’s and 
putting all of the above arguments together, we get 


Ose die 1 lil Layee ee awa 
a 6 a 
Ny N_ 
iO, —16, 
e! me ae) 


where N, + N_ + 2m =dimO. 

Getting insight from Example 6.5.1, we can argue, admittedly in a non- 
rigorous way, that corresponding to each pair e*!*/ is a 2 x 2 matrix of the 
form given in Eq. (6.26). 


We can add more rigor to the preceding example by the process of com- 
plexification and the notion of a complex structure. Recall from Eq. (2.22) 
that a real 2m-dimensional vector space can be reduced to an m-dimensional 
complex space. Now consider the restriction of the orthogonal operator O 
on the 2K-dimensional vector subspace W of Eq. (6.25), and let J be a 


This can always be done by formally identifying transposition with hermitian conjuga- 
tion, an identification that holds when the underlying field of numbers is real. 
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complex structure on that subspace. Let {| /;), J| Ais , be an orthonormal 
basis of W and we, the complexification of W; = Span{| fi)}< ,- Define 
the unitary operator U on W, by 


Ul fj) = Ol fj). 


and extend it by linearity and Eq. (2.22), which requires that O and J com- 
mute. This replaces the orthogonal operator O on the 2K -dimensional vec- 
tor space W with a unitary operator U on the K -dimensional vector space 
W). Thus, we can apply the complex spectral decomposition and replace the 
| fi) with |e;), the eigenvectors of U. 

We now find the matrix representation of O in this new orthonormal basis 
from that of U. For 7 = 1,..., K, we have 


Ole ;) =Ule;) =e |e;) = (cos; +i sin9;)le;) 


= (cos6;1 + sin6;J)|e;) = cos6;|e;) + sind; |e;+1) 


Ole j+1) = OJle;) =JOle;) = iVle;) = ie!’ |e;) 
= (icos@; — sin6;)|e;) = (cos@j;J — sin6;1)|e;) 


= —sin6;|e;) +cos6;|ej+1). 


Thus the jth and 7 + Ist columns will be of the form 


0) 0 

0 0 
cosé; —sin6; 
sin 0; cos 6; 

0 0 

0) 0 


Putting all the columns together reproduces the result of Theorem 6.6.12. 


Example 6.6.14 An interesting application of Theorem 6.6.12 occurs in 
classical mechanics, where it is shown that the motion of a rigid body con- 
sists of a translation and a rotation. The rotation is represented by a 3 x 3 
orthogonal matrix. Theorem 6.6.12 states that by an appropriate choice of 
coordinate systems (i.e., by applying the same orthogonal transformation 
that diagonalizes the rotation matrix of the rigid body), one can “diagonal- 
ize” the 3 x 3 orthogonal matrix. The “diagonal” form is 


+1 O 0) +1 0 0 
+1 0O or 0 cos@é —sin@ 
0 Oo +1 QO sind cosé 
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Excluding the reflections (corresponding to —1’s) and the trivial identity 
rotation, we conclude that any rotation of a rigid body can be written as 


1 0 0) 
0 cosd —siné }, 
O sin@ cosé 


which is a rotation through the angle 6 about the (new) x-axis. 


Combining the rotation of the example above with the translations, we 
obtain the following theorem. 


Theorem 6.6.15 (Euler) The general motion of a rigid body consists of 
the translation of one point of that body and a rotation about a single axis 
through that point. 


Example 6.6.16 As a final example of the application of the results of this 
section, let us evaluate the n-fold integral 


[o.@) lee) [o@) n 
he / dx | dx)... i Pe (6.27) 
—oo —0o —oo 


where the m;; are elements of a real, symmetric, positive definite matrix, 
say M. Because it is symmetric, M can be diagonalized by an orthogonal 
matrix R so that RMR’ = D is a diagonal matrix whose diagonal entries are 
the eigenvalues, A1,A2,...,An, of M, whose positive definiteness ensures 
that none of these eigenvalues is zero or negative. 

The exponent in (6.27) can be written as 


n 
S > mijxixj = x'Mx = x'R!RMR'Rx = x"Dx! = Aix? ++ + An, 
ijl 


where 
/ 
Xx} Xx] 
14 
; X5 x2 
x= =Rx=R]. |, 
/ 
x xy 
or, in component form, na = Vin rijxj fori =1,2,...,n. Similarly, since 
x =R’x’, it follows that x; = pee rjix) fori =1,2,...,n. 


The “volume element” dx; ---dx, is related to the primed volume ele- 
ment as follows: 


O(X1,X2,..., 
EO og = haeae lsd 


n? 


axon =| 


cle ee comers. 
where J is the Jacobian matrix whose ijth element is 0x; / ax’. But 
Ox; 


7 
X's 
ax 


=f > Jan SS ‘det [dete | 1. 
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Therefore, in terms of x’, the integral [,, becomes 


=f ax, [ dx}. ae dle MXP Aaay ant 
9, a 2 
= = dxie ee dxe —haxy ‘). (/ dx) e~*n*n ) 
—oCo 
1 n/2 —1/2 
a a Fae TRE ee ame = 7"/*(detM) ; 
2 VA AD An 2 


because the determinant of a matrix is the product of its eigenvalues. This 
result can be written as 


[o,e) 
/ d"xe7* M® — 7"! (detm)—!/2, 
—oo 


which gives an analytic definition of the determinant. 


Proposition 6.6.17 The determinant of a positive definite matrix M is 
given by 


qe 


Kies d"xe7X'Mx)2 é 


det M= 
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We have seen many similarities between operators and complex numbers. 
For instance, hermitian operators behave very much like the real numbers: 
they have real eigenvalues; their squares are positive; every operator can be 
written as X + 1Y, where both X and Y are hermitian; and so forth. Also, 
unitary operators can be written as exp(iH), where H is hermitian. So uni- 
tary operators are the analogue of complex numbers of unit magnitude such 
as ef", 

A general complex number z can be written as re!?, where r = /z*z. 
Can we write an arbitrary operator T in an analogous way? Perhaps as 
VT'T exp(iH), with H hermitian? The following theorem provides the an- 
swet. 


Theorem 6.7.1 (Polar Decomposition) An operator T on a (real or com- 
plex) finite-dimensional inner product space can be written as T = UR where 
R is a positive operator and VU an isometry (a unitary or orthogonal opera- 
tor). 


Proof With insight from the complex number theory, let R= VT'T, where 
the right-hand side is understood as the positive square root. Now note that 


(Ta|Ta) = (a|T'T\a) = (a|R*|a) = (a|R'R\a) = (Ra|Ra) 
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because R is positive, and therefore self-adjoint. This shows that T|a) and 
R|a) are connected by an isometry. Since T|a) and R|a) belong to the ranges 
of the two operators, this isometry can be defined only on those ranges. 

Define the linear (reader, verify!) isometry U : R(V) > T(V) by UR|x) = 
T|x) for |x) € V, and note that by its very definition, U is surjective. First 
we have to make sure that U is well defined, i.e., it does not map the same 
vector onto two different vectors. This is a legitimate concern, because R 
may not be injective, and two different vectors of V may be mapped by R 
onto the same vector. So, assume that R|a,) = R|a2). Then 


UR|a}) =UR|a2) = Tla1) =Tlaz). 


Hence, U is well defined. 

Next note that any linear isometry is injective (Theorem 2.3.12). There- 
fore, U is invertible and R(V)+ = T(V)+. To complete the proof, let {|e;) fan 
be an orthonormal basis of R(V)+ and {| fi)}7, an orthonormal basis of 
T(V)+ and extend U by setting Uje;) = | f;). 


We note that if T is injective, then R is invertible, and therefore, unique. 
However, U is not unique, because for any isometry S$: T(V) — T(V), the 
operator S o U works just as well in the proof. 

It is interesting to note that the positivity of R and the nonuniqueness of 
U are the analogue of the positivity of r and the nonuniqueness of e’® in the 
polar representation of complex numbers: 


z=re® =rel Ot) Wn eZ. 


In practice, R is found by spectrally decomposing T'T and taking its pos- 
itive square root.!° Once R is found, U can be calculated from the definition 
T = UR. This last step is especially simple if T is injective. 


Example 6.7.2 Let us find the polar decomposition of 


A= ‘es ) j 


We have 


Pana=( S)(o 3)= Cav te): 


The eigenvalues and eigenvectors of R? are routinely found to be 


1 i I 7 
nai aa =z Yq), d= Za (7). 


!0Tt is important to pay attention to the order of the two operators: One decomposes T'T, 
not TT?. 
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The projection matrices are 


1 1 iVv7 
Prslenel=3(_, 7 y) 

1 7 —iV/7 
Pr=ledel=5 (79 ey 


To find U, we note that detA is nonzero. Hence, A is invertible, which im- 
plies that R is also invertible. The inverse of R is 


poi | (ilv2 -iv14 
DAN IA! Bf Do 


The unitary matrix is simply 


(ofan (2 37 
= ~ 94\ 31/14 1572)" 


It is left for the reader to verify that U is indeed unitary. 


Thus, 


Example 6.7.3 Let us decompose the following real matrix into its polar 


form: 
2 O 
ey 


The procedure is the same as in the complex case. We have 


2 3 2 O 13. —6 

2_ pty — = 

rawa=(s 2)( S)=(%5 4) 
with eigenvalues 4; = 1 and Az = 16 and normalized eigenvectors 


nine) mt w= 2(2) 


The projection operators are 


1 1/4 -2 
=lental=3(; Ar Pr=leanteal== (4, a 


Thus, we have 


R=VR? = J/a1Pi + Va2P2 


“362 4)+3(4 V)-3(% =): 
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We note that A is invertible. Thus, R is also invertible, and 


This gives O = AR7!, or 


_1(4 3 
“5 \3 -4)° 


It is readily verified that O is indeed orthogonal. 


6.8 Problems 


6.1 Let P be the (hermitian) projection operator onto a subspace JM. Show 
that 1 — P projects onto (+. Hint: You need to show that (m|P|a) = (m|a) 
for arbitrary |a) € V and |m) € M; therefore, consider (m|P|a)*, and use the 
hermiticity of P. 


6.2 Show that a subspace M of an inner product space V is invariant under 
the linear operator A if and only if (+ is invariant under A’. 


6.3 Show that the intersection of two invariant subspaces of an operator is 
also an invariant subspace. 


6.4 Let z be a permutation of the integers {1, 2,...,}. Find the spectrum 
of A,,, if for |x) = (a@1, @2,..., An) € C”, we define 


Ax |x) = (@n(1), +--+, n(n). 


6.5 Let Jay) =a; = (1, 1, —1) and Jax) = a0 = (—2, 1, -1). 


(a) Construct (in the form of a matrix) the projection operators P; and Pz 
that project onto the directions of |a;) and |a2), respectively. Verify 
that they are indeed projection operators. 

(b) Construct (in the form of a matrix) the operator P = P; + P2 and verify 
directly that it is a projection operator. 

(c) Let P act on an arbitrary vector (x, y, z). What is the dot product of 
the resulting vector with the vector a; x a2? Is that what you expect? 


6.6 Show that 


(a) the coefficient of A% in the characteristic polynomial of any linear 
operator is (— 1)\, where N = dimV, and 

(b) the constant in the characteristic polynomial of an operator is its de- 
terminant. 


6.8 Problems 


6.7 Operators A and B satisfy the commutation relation [A, B] = 1. Let |b) 
be an eigenvector of B with eigenvalue 2. Show that e~*|b) is also an 
eigenvector of B, but with eigenvalue 4 + t. This is why e~* is called the 
translation operator for B. Hint: First find [B, e774]. 


6.8 Find the eigenvalues of an involutive operator, that is, an operator A 
with the property A? = 1. 


6.9 Assume that A and A’ are similar matrices. Show that they have the 
same eigenvalues. 


6.10 In each of the following cases, determine the counterclockwise rota- 
tion of the x y-axes that brings the conic section into the standard form and 
determine the conic section. 


(a) 11x? +3y?+6xy—12=0, 
(b) 5x? —3y?+6xy+6=0, 
(c) 2x*—y*—4xy-3=0, 

(d) 6x? +3y? —4xy —7=0, 
(e) 2x?+5y?—4xy —36=0. 


6.11 Show that if A is invertible, then the eigenvectors of A~! are the same 
as those of A and the eigenvalues of A~! are the reciprocals of those of A. 


6.12 Find all eigenvalues and eigenvectors of the following matrices: 


2 —2-1 
A, = ( ) By = G 3 C;=]-13 1 
2-4-1 
101 1 1 0 -1 1 
As={0 1 O Bo=j;1 0 1 Co=]} 1 -!l 
101 0 1 1 1 -l 
1 1 1 01 1 
A3=|0 1 B3 = 1 1 Cz3=]1 0 1 
0 0 1 1 1 1 1 0 


6.13 Show that a 2 x 2 rotation matrix does not have a real eigenvalue (and, 
therefore, eigenvector) when the rotation angle is not an integer multiple 
of 2. What is the physical interpretation of this? 


6.14 Three equal point masses are located at (a,a,0), (a,0,a), and 
(0, a,a). Find the moment of inertia matrix as well as its eigenvalues and 
the corresponding eigenvectors. 


6.15 Consider (a1, @2,...,@,) € C” and define E;; as the operator that in- 
terchanges a; and a;. Find the eigenvalues of this operator. 
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6.16 Find the eigenvalues and eigenvectors of the operator —id/dx acting 
in the vector space of differentiable functions C!(—oo, oo). 


6.17 Show that a hermitian operator is positive iff its eigenvalues are posi- 
tive. 


6.18 Show that ||Ax|| = ||A‘x|| if and only if A is normal. 


6.19 What are the spectral decompositions of A‘, A~!, and AA’ for an in- 
vertible normal operator A? 


6.20 Consider the matrix 


(a) Find the eigenvalues and the orthonormal eigenvectors of A. 
(b) Calculate the projection operators (matrices) P; and P2 and verify that 
»; P; =1 and >; A;P) =A. 
(c) Find the matrices /A, sin(@A), and cos(@A) and show directly that 
sin?(OA) + cos*(@A) = 1. 
(d) Is A invertible? If so, find A~! using spectral decomposition of A. 


6.21 Consider the matrix 


4 i 1 
A=|-i 4 -i 
1 ai 4 


(a) Find the eigenvalues of A. Hint: Try 2 = 3 in the characteristic poly- 
nomial of A. 

(b) For each 4, find a basis for 1, the eigenspace associated with the 
eigenvalue A. 

(c) Use the Gram-Schmidt process to orthonormalize the above basis vec- 
tors. 

(d) Calculate the projection operators (matrices) P; for each subspace and 
verify that )7, P; =1 and 0, AjP; =A. 

(e) Find the matrices /A, sin(zA/2), and cos(zA/2). 

(f) Is A invertible? If so, find the eigenvalues and eigenvectors of A~!. 


6.22 Show that if two hermitian matrices have the same set of eigenvalues, 
then they are unitarily related. 


6.23 Prove that corresponding to every unitary operator U acting on a finite- 
dimensional vector space, there is a hermitian operator H such that U = 
exp(iH). 


6.8 Problems 
6.24 Prove Lemma 6.6.4 by showing that 
(a|T? + aT + B1|a) > |\Tall? — je ||Tal| lal] + Blala), 


which can be obtained from the Schwarz inequality in the form |(a|b)| > 
—|la||||b||. Now complete the square on the right-hand side. 


6.25 Show that a normal operator T on a real vector space can be diagonal- 
ized as in Eqs. (6.23) and (6.24). 


6.26 Show that an arbitrary matrix A can be “diagonalized” as D = UAV, 
where U is unitary and D is a real diagonal matrix with only nonnegative 


eigenvalues. Hint: There exists a unitary matrix that diagonalizes AA‘. 


6.27 Find the polar decomposition of the following matrices: 


1 
2% 0 41 —12i 
a=(% A? a= (55, re = ; 


6.28 Show that for an arbitrary matrix A, both AA‘ and A‘A have the same 
set of eigenvalues. Hint: Use the polar decomposition theorem. 


-i 


0 1 
1 
i 0 


6.29 Show that 


(a) if A is an eigenvalue of an antisymmetric operator, then so is —A, and 
(b) antisymmetric operators (matrices) of odd dimension cannot be invert- 
ible. 


6.30 Find the unitary matrices that diagonalize the following hermitian ma- 
trices: 


i 2 ie a ae a 
a=(_2, =1 ) m=(2, ) a= (i al 


Warning! You may have to resort to numerical approximations for some of 
these. 


6.31 Let A= (. where a € C and a £0. Show that it is impossible to 
find an invertible 2 x 2 matrix R such that RAR™! is diagonal. Now show 
that A is not normal as expected from Proposition 6.4.11. 
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Part Il 
Infinite-Dimensional Vector Spaces 


Hilbert Spaces 


The basic concepts of finite-dimensional vector spaces introduced in Chap. 2 
can readily be generalized to infinite dimensions. The definition of a vector 
space and concepts of linear combination, linear independence, subspace, 
span, and so forth all carry over to infinite dimensions. However, one thing 
is crucially different in the new situation, and this difference makes the study 
of infinite-dimensional vector spaces both richer and more nontrivial: In a 
finite-dimensional vector space we dealt with finite sums; in infinite dimen- 
sions we encounter infinite sums. Thus, we have to investigate the conver- 
gence of such sums. 


7.1 The Question of Convergence 


The intuitive notion of convergence acquired in calculus makes use of the 
idea of closeness. This, in turn, requires the notion of distance.! We consid- 
ered such a notion in Chap. 2 in the context of a norm, and saw that the inner 
product had an associated norm. However, it is possible to introduce a norm 
on a vector space without an inner product. 

One such norm, applicable to C” and R”, was 


1/p 


n 
llallp=(Solal’) 


i=l 


where p is an integer. The “natural” norm, i.e., that induced on C” (or R”) 
by the usual inner product, corresponds to p = 2. The distance between 
two points depends on the particular norm used. For example, consider the 
“point” (or vector) |b) = (0.1,0.1,...,0.1) in a 1000-dimensional space 
(n = 1000). One can easily check that the distance of this vector from the 
origin varies considerably with p: ||b||; = 100, ||b||2 = 3.16, ||b\|19 = 0.2. 
This variation may give the impression that there is no such thing as “close- 
ness”, and it all depends on how one defines the norm. This is not true, 


'It is possible to introduce the idea of closeness abstractly, without resort to the notion 
of distance, as is done in topology. However, distance, as applied in vector spaces, is as 
abstract as we want to get. 
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because closeness is a relative concept: One always compares distances. 
A norm with large p shrinks all distances of a space, and a norm with small 
p Stretches them. Thus, although it is impossible (and meaningless) to say 
that “|a) is close to |b)” because of the dependence of distance on p, one 
can always say “|a) is closer to |b) than |c) is to |d)”, regardless of the value 
of p. 

Now that we have a way of telling whether vectors are close together 
or far apart, we can talk about limits and the convergence of sequences of 
vectors. Let us begin by recalling the definition of a Cauchy sequence (see 
Definition 1.3.4): 


Definition 7.1.1 An infinite sequence of vectors {|a;)}?°, in a normed lin- 


ear space V is called a Cauchy sequence if lim j — oo ||a; — a;|| =0 
5 oan,2) 


A convergent sequence is necessarily Cauchy. This can be shown using 
the triangle inequality (see Problem 7.2). However, there may be Cauchy 
sequences in a given vector space that do not converge to any vector in 
that space (see the example below). Such a convergence requires additional 
properties of a vector space summarized in the following definition. 


Definition 7.1.2 A complete vector space V is a normed linear space for 
which every Cauchy sequence of vectors in V has a limit vector in V. In 
other words, if {|a;)}?°, is a Cauchy sequence, then there exists a vector 
|a) € V such that limj_s 9 ||la; — a|| =0 


Example 7.1.3 (1) R is complete with respect to the absolute-value norm 
||| = ||. In other words, every Cauchy sequence of real numbers has a 
limit in R. This is proved in real analysis. 

(2) C is complete with respect to the norm ||a|| = |a| = /(Rea)? + (Ima). 
Using |a| < |Rea| + |Ima], one can show that the completeness of C fol- 
lows from that of R. Details are left as an exercise for the reader. 

(3) The set of rational numbers Q is not complete with respect to the 
absolute-value norm. In fact, {(1 + 1/ kk Wea is a sequence of rational num- 
bers that is Cauchy but does not converge to a rational number; it converges 
to e, the base of the natural logarithm, which is known to be an irrational 
number. (See also the discussion after Definition 1.3.4.) 


Let {|a;)}?2, be a Cauchy sequence of vectors in a finite-dimensional 
vector space Vy. Choose an orthonormal basis {lex)}e_, in Vy such that” 


oo 
lai) = OM a lex) and Jaj) = DN, a Jez). Then 


Recall that one can always define an inner product on a finite-dimensional vector space. 
So, the existence of orthonormal bases is guaranteed. 


7.1. The Question of Convergence 


N 2 
2 i (/) 
lar — aj? = (a; — ala; — aj) =|] (a? — oy.) lex) 
k=1 
N N 
j j) 2 
= > (a! (i) or{?? Y* (a — or) ) (exer) =p 3) jor — ol? 


kl=1 k=1 


The LHS goes to zero, because the sequence is assumed Cauchy. Further- 
more, all terms on the RHS are positive. Thus, they too must go to zero 
as i, j > oo. By the completeness of C, there must exist a, € C such that 
limy—so0 a”) =a, fork = 1,2,...,N. Now consider |a) € Vy given by 
a)= > a|e;). We claim that |a) is the limit of the above sequence of 
vectors in Vj. Indeed, 


N 
li _ 2 li (i) lit (i) 2. 
ou lla; — all fm |: Ok a im n lox — a 
We have proved the following: 


Proposition 7.1.4 Every Cauchy sequence in a finite-dimensional in- 
ner product space over C (or R) is convergent. In other words, every 
finite-dimensional complex (or real) inner product space is complete 
with respect to the norm induced by its inner product. 


The next example shows how important the word “finite” is. 


Example 7.1.5 Consider { f,}7° ,, the infinite sequence of continuous func- 
tions defined in the interval [—1, +1] by 


1 if 1/k <x <1, 
fee) = 4 (kx +1)/2 if -—1/k <x <1/k, 
0 if-~12x%<—1/k, 


This sequence belongs to C°(—1, 1), the inner product space of continu- 
ous functions with its usual inner product: (f|g) = ic f*(x)g(x) dx. It is 
straightforward to verify that || f; — f; ||? = vies | fe(x) — fj (a) ?dx — 
,j>0oo 

0. Therefore, the sequence is Cauchy. However, the limit of this sequence is 
(see Fig. 7.1) 

f(x) 1 if0<x<1, 

x)= 
0 if-l<x <0, 


which is discontinuous at x = 0 and therefore does not belong to the space 
in which the original sequence lies. 


We see that infinite-dimensional vector spaces are not generally com- 
plete. It is a nontrivial task to show whether or not a given infinite- 
dimensional vector space is complete. 


all finite-dimensional 
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Fig. 7.1 The limit of the sequence of the continuous functions f; is a discontinuous 
function that is 1 for x > 0 and 0 for x <0 


Any vector space (finite- or infinite-dimensional) contains all finite linear 
combinations of the form )~;_, aj|a;) when it contains all the |a;)’s. This 
follows from the very definition of a vector space. However, the situation 
is different when n goes to infinity. For the vector space to contain the infi- 
nite sum, firstly, the meaning of such a sum has to be clarified, i.e., a norm 
and an associated convergence criterion needs to be put in place. Secondly, 
the vector space has to be complete with respect to that norm. A complete 
normed vector space is called a Banach space. We shall not deal with a gen- 
eral Banach space, but only with those spaces whose norms arise naturally 
from an inner product. This leads to the following definition: 


Definition 7.1.6 A complete inner product space, commonly denoted by 
H, is called a Hilbert space. 


Thus, all finite-dimensional real or complex vector spaces are Hilbert 
spaces. However, when we speak of a Hilbert space, we shall usually assume 
that it is infinite-dimensional. 

It is convenient to use orthonormal vectors in studying Hilbert spaces. 
So, let us consider an infinite sequence {|e;)}?°, of orthonormal vectors all 
belonging to a Hilbert space F(. Next, take any vector | f) € H{, construct 
the complex numbers f; = (e;| f), and form the sequence of vectors* 


lin) =< filer) forn=1,2,... (7.1) 
i=1 


For the pair of vectors | f) and | f,,), the Schwarz inequality gives 
> n 
P< IP alfa) = SIP SLAP], (7.2) 


i=1 


(fl fn) 


3We can consider | fn) as an “approximation” to | f), because both share the same com- 
ponents along the same set of orthonormal vectors. The sequence of orthonormal vectors 
acts very much as a basis. However, to be a basis, an extra condition must be met. We 
shall discuss this condition shortly. 
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where Eq. (7.1) has been used to evaluate ( f;,| f,). On the other hand, taking 
the inner product of (7.1) with (f| yields 


m=) pile] irs at 
i=l i=l i=l 


Substitution of this in Eq. (7.2) yields the Parseval inequality: Parseval inequality 
n 
pe ie cana? (7.3) 
i=l 


This conclusion is true for arbitrarily large n and can be stated as follows: 


Proposition 7.1.7 Let {|e;)}?°, be an infinite set of orthonormal vectors in 
a Hilbert space, H. Let | f) € H and define complex numbers f; = (ei|f). 
Then the Bessel inequality holds: )-°°,, | f; I? <(fIf). Bessel inequality 


The Bessel inequality shows that the vector 


[o) n 
> file’) = lim? filer) 
i=l i=l 


converges; that is, it has a finite norm. However, the inequality does not say 
whether the vector converges to |). To make such a statement we need 
completeness: 


Definition 7.1.8 A sequence of orthonormal vectors {|e;)}°, in a Hilbert complete orthonormal 
space H is called complete if the only vector in J that is orthogonal to all sequence of vectors; 
the |e;) is the zero vector, in which case {|e;)}?°, is called a basis for J. basis for H 
The notion of completeness does not enter the discussion of an N- 
dimensional vector space, because any N orthonormal vectors form a basis. 
If you take away some of the vectors, you don’t have a basis, because you 
have less than N vectors. The situation is different in infinite dimensions. If 
you start with a basis and take away some of the vectors, you still have an 
infinite number of orthonormal vectors. The notion of completeness ensures 
that no orthonormal vector is taken out of a basis. This completeness prop- 
erty is the extra condition alluded to (in the footnote) above, and is what is 
required to make a basis. 
In mathematics literature, one distinguishes between a general and a sep- 
arable Hilbert space. The latter is characterized by having a countable basis. 
Thus, in the definition above, the Hilbert space is actually a separable one, 
and from now on, by Hilbert space we shall mean a separable Hilbert space. 


Proposition 7.1.9 Let {\e;)}°°, be an orthonormal sequence in H. Then 
the following statements are equivalent: 


1. {le;)}P2, is complete. 


2. [f)= PE lei elf) VIF) € H. 
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3: pyran |e; )(e; =1. 
4. (flg)= D2 (fleierlg) VIF), 18) € H 
5. WAI? = Ey Mei AP VIS) € I. 


Proof We shall prove the implications 1>2>5>3>4>555 1. 


1= 2:  Itis sufficient to show that the vector |) =| f) — 772, lei) (ei lf) 
is orthogonal to all the |e;): 


(oe) 


bij 
(esl) = (elf) — >. (ejleieil f) =0 


i=l 


2=>3: Since |f) = 1|f) = 072, (lei) (ei|)| f) is true for all | f) ¢ H, we 
must have 1 = )°°°, |e) (ei. 

34: (fig) = (S118) = (FICE) ler) (eDIg) = U1 (Sf lei) (eilg)- 

4=5: Let |g) =|/f) in statement 4 and recall that ( f|e;) = (e;| f)*. 

5=> 1:  Let|f) be orthogonal to all the |e;). Then all the terms in the sum 
are zero implying that || f||7 = 0, which in turn gives | f) = 0, 
because only the zero vector has a zero norm. 


The equality 


WAP = FLAY = Mei f)/? = Dir = (elf), (7.4) 


i=l 


is called the Parseval equality, and the complex numbers /f; are called gen- 
eralized Fourier coefficients. The relation 


T=) lei) (eil (7.5) 


i=1 
is called the completeness relation. 


Historical Notes 

David Hilbert (1862-1943), the greatest mathematician of the twentieth century, re- 
ceived his Ph.D. from the University of K6nigsberg and was a member of the staff there 
from 1886 to 1895. In 1895 he was appointed to the chair of mathematics at the University 
of Gottingen, where he continued to teach for the rest of his life. 

Hilbert is one of that rare breed of late 19th-century mathematicians whose spectrum 
of expertise covered a wide range, with formal set theory at one end and mathemati- 
cal physics at the other. He did superb work in geometry, algebraic geometry, algebraic 
number theory, integral equations, and operator theory. The seminal two-volume book 
Methoden der mathematische Physik by R. Courant, still one of the best books on the 
subject, was greatly influenced by Hilbert. 

Hilbert’s work in geometry had the greatest influence in that area since Euclid. A system- 
atic study of the axioms of Euclidean geometry led Hilbert to propose 21 such axioms, and 
he analyzed their significance. He published Grundlagen der Geometrie in 1899, putting 
geometry on a formal axiomatic foundation. His famous 23 Paris problems challenged 
(and still today challenge) mathematicians to solve fundamental questions. 

It was late in his career that Hilbert turned to the subject for which he is most famous 
among physicists. A lecture by Erik Holmgren in 1901 on Fredholm’s work on integral 
equations, which had already been published in Sweden, aroused Hilbert’s interest in 
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the subject. David Hilbert, having established himself as the leading mathematician of 
his time by his work on algebraic numbers, algebraic invariants, and the foundations of 
geometry, now turned his attention to integral equations. He says that an investigation 
of the subject showed him that it was important for the theory of definite integrals, for 
the development of arbitrary functions in series (of special functions or trigonometric 
functions), for the theory of linear differential equations, for potential theory, and for the 
calculus of variations. He wrote a series of six papers from 1904 to 1910 and reproduced 
them in his book Grundziige einer allgemeinen Theorie der linearen Integralgleichungen 
(1912). During the latter part of this work he applied integral equations to problems of 
mathematical physics. 

It is said that Hilbert discovered the correct field equation for general relativity in 1915 
(one year before Einstein) using the variational principle, but never claimed priority. 
Hilbert claimed that he worked best out-of-doors. He accordingly attached an 18-foot 
blackboard to his neighbor’s wall and built a covered walkway there so that he could work 
outside in any weather. He would intermittently interrupt his pacing and his blackboard 
computations with a few turns around the rest of the yard on his bicycle, or he would pull 
some weeds, or do some garden trimming. Once, when a visitor called, the maid sent him 
to the backyard and advised that if the master wasn’t readily visible at the blackboard to 
look for him up in one of the trees. 

Highly gifted and highly versatile, David Hilbert radiated over mathematics a catching 
optimism and a stimulating vitality that can only be called “the spirit of Hilbert.” Engraved 
on a stone marker set over Hilbert’s grave in Gottingen are the master’s own optimistic 
words: ‘““Wir miissen wissen. Wir werden wissen.” (“We must know. We shall know.”) 


7.2. The Space of Square-Integrable Functions 


Chapter 2 showed that the collection of all continuous functions defined on 
an interval [a,b] forms a linear vector space. Example 7.1.5 showed that 
this space is not complete. Can we enlarge this space to make it complete? 
Since we are interested in an inner product as well, and since a natural inner 
product for functions is defined in terms of integrals, we want to make sure 
that our functions are integrable. However, integrability does not require 
continuity, it only requires piecewise continuity. In this section we shall dis- 
cuss conditions under which the space of functions becomes complete. An 
important class of functions has already been mentioned in Chap. 2. These 
functions satisfy the inner product given by 


b 
alae | Mor Ojuayae. 


If g(x) = f(x), we obtain 


b 
(ff) = / fC) Pwix)de. (7.6) 


Functions for which such an integral is defined are said to be square- 
integrable. 

The space of square-integrable functions over the interval [a, b] is de- 
noted by £2 (a, b). In this notation £ stands for Lebesgue, who generalized 
the notion of the ordinary Riemann integral to cases for which the integrand 
could be highly discontinuous; 2 stands for the power of f (x) in the integral; 
a and b denote the limits of integration; and w refers to the weight function 
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(a strictly positive real-valued function). When w(x) = 1, we use the no- 
tation £2(a, b). The significance of La, b) lies in the following theorem 
(for a proof, see [Reed 80, Chap. I]): 


Theorem 7.2.1 (Riesz-Fischer theorem) The space fie (a, b) is complete. 


A complete infinite-dimensional inner product space was earlier defined 
to be a Hilbert space. The following theorem shows that the number of 
(separable) Hilbert spaces is severely restricted. (For a proof, see [Frie 82, 
p. 216].) 


Theorem 7.2.2 All complete inner product spaces with countable bases are 
isomorphic to Le (a,b). 


L? (a, b) is defined in terms of functions that satisfy Eq. (7.6). Yet an 
inner product involves integrals of the form f : g* (x) f (x) w(x) dx. Are such 
integrals well-defined and finite? Using the Schwarz inequality, which holds 
for any inner product space, finite or infinite, one can show that the integral 
is defined. 

The isomorphism of Theorem 7.2.2 makes the Hilbert space more tan- 
gible, because it identifies the space with a space of functions, objects that 
are more familiar than abstract vectors. Nonetheless, a faceless function is 
very little improvement over an abstract vector. What is desirable is a set 
of concrete functions with which we can calculate. The following theorem 
provides such functions (for a proof, see [Simm 83, pp. 154—161]). 


Theorem 7.2.3 (Stone-Weierstrass approximation theorem) The se- 
quence of monomials a oe forms a basis of £2,(a, b). 


Thus, any square-integrable function f can be written as f(x) = 
> p9 @ex*. This theorem shows that £2, (a, b) is indeed a separable Hilbert 
space as expected in Theorem 7.2.2. 


7.2.1 Orthogonal Polynomials 


The monomials {x* }¢2 are not orthonormal but are linearly independent. 
If we wish to obtain an orthonormal—or simply orthogonal—linear combi- 
nation of these vectors, we can use the Gram-Schmidt process. The result 
will be certain polynomials, denoted by C,(x), that are orthogonal to one 
another and span Vee (a, b). 

Such orthogonal polynomials satisfy very useful recurrence relations, 
which we now derive. In the following discussion p<, (x) denotes a generic 
polynomial of degree less than or equal to k. For example, 3x> — 4x? + 
5, 2x +1, —2.4x4 + 3x3 — x? + 6, and 2 are all denoted by p<s(x) or 
p<s(x) or p<s59(x) because they all have degrees less than or equal to 5, 8, 
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and 59. Since a polynomial of degree less than n can be written as a linear 
combination of C(x) with k <n, we have the obvious property 


b 
/ Cn (x) Pen—1 (x) w(x) dx = 0. (7.7) 


a 


Let Kn ”) and eo denote, respectively, the coefficients of x” and a 


in C,,(x), and let 


b 
2 
hin =| [Cn (x)] w(x)dx. (7.8) 
The polynomial Cy+1(x) — Go / kK) xCy (x) has degree less than or 
equal to n, and therefore can be expanded as a linear combination of the 
Cj (x): 

sel n 
CoG) Xo eC = aes (x). (7.9) 

iz 


Take the inner product of both sides of this equation with C,, (x): 


(n+1) 


[ Crt) Em we) dx = AE i Cn (2)Cm (x) w(x) dx 


Li b 
=a Cj (x) Cm (x) w(x) dx. 
j=0 a 


The first integral on the LHS vanishes as long as m <n; the second integral 
vanishes if m <n — 2 [if m <n —2, then xC,,(x) is a polynomial of degree 
n — 1]. Thus, we have 


i b 
oa; i Cj(x)Cm(x)w(x)dx=0 form <n—2. 
j=0 “4 


The integral in the sum is zero unless j = m, by orthogonality. Therefore, 
the sum reduces to 


b 
ii i [Cn(x)}-wo)dx =0 form <n—2. 


Since the integral is nonzero, we conclude that a, = 0 form =0,1,2,..., 
n — 2, and Eq. (7.9) reduces to 


— 
Cn4i(x) - + io XCy(X) = An—1Cn—1 1) + an Cn). (7.10) 


It can be shown that if we define 


(n+) (n) -1 

ee Kit Bn = 08 ( Kil _ ie ‘) oo hn Qn 

n= aH” n=n\ “Gah am)? n= a 
kn n+1 kn 
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then Eq. (7.10) can be expressed as 


Cn41 0%) = (nx + Bn) Cn (Xx) + YnCn-1(), (7.12) 
or 
_ 1 Bn Yn 
XCn(X) = —Cn41(x) — —Cr(x) — —Cp-1 (x). (7.13) 
An Qn An 


Other recurrence relations, involving higher powers of x, can be obtained 
from the one above. For example, a recurrence relation involving x* can be 
obtained by multiplying both sides of Eq. (7.13) by x and expanding each 
term of the RHS using that same equation. The result will be 


C30) = (> 42 Pe) cues 


AnAn+1 a 


x7? Cy (x) = 


n@n+1 


2 
( Yn+i Ph 4: Yn Jeno 


AnAn+1 a AnAn—1 
+ (fa a pote Ve, oO a va, ponn2et). (7-14) 
ay AnAn—1 


Example 7.2.4 As an application of the recurrence relations above, let us 
evaluate 


b 
n= [ XCm(X)Cy (x) w(x) dx. 


a 


Substituting (7.13) in the integral gives 


= ak Cm(x)Cn41(x)w(x) dx — an Cm (*)Cn (x) w(x) dx 


a 


y b 
— 2 fF Cn(e)Cn-1 (x) w(x) dx. 


An Ja 
We now use the orthogonality relations among the Cx (x) to obtain 


=hn 


—__ 
= 1 b 2 Bn P 2 
T= —6bn nti Cy, (x) w(x) dx ——4bmn Cin (X) w(x) dx 
an a a a 


n 


y b 
= inn | C2 (x)w(x) dx 
An a 


1 B +1 
= ( Sm,n+1 ” dmn is bmn-t hn 


Am-1 Am Am+1 
or 
hin/&m-1 ifm=n-+1, 
—Bmhmn/&m ifm =n, 


—Ym+ihm/an+1 ifm=n—1, 
0 otherwise. 
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Example 7.2.5 Let us find the orthogonal polynomials forming a basis of 
£2(—1, +1), which we denote by P,, (x), where n is the degree of the poly- 
nomial. Let Po(x) = 1. To find P; (x), write P; (x) = ax + b, and determine 
a and b in such a way that P; (x) is orthogonal to Po(x): 


1 


' 1 
1 
o=/ Pi(x)Pocyax = | (ax + b)dx = 5ax* 


+ 2b= 2b. 
1 


So one of the coefficients, b, is zero. To find the other one, we need 
some standardization procedure. We “standardize” P,(x) by requiring that 
P,(1) = 1 Vn. Forn = 1 this yields a x 1 = 1, ora = 1, so that Pi (x) =x. 

We can calculate P2(x) similarly: Write P2(x) = ax* + bx +c, impose 
the condition that it be orthogonal to both P(x) and Po(x), and enforce the 
standardization procedure. All this will yield 


1 1 
2 2 
o= | P2(x) Po(x) dx = 3a t 2c, o= | Px(x) Pi (x) dx = 3 
=| = 
and P2(1) =a+b+c=1. These three equations have the unique solution 
a=3/2,b=0,c =—1/2. Thus, Po(x) = 5 (3x? — 1). These are the first 
three Legendre polynomials, which are part of a larger group of polynomials 
to be discussed in Chap. 8. 


7.2.2. Orthogonal Polynomials and Least Squares 


The method of least squares is no doubt familiar to the reader. In the simplest 
procedure, one tries to find a linear function that most closely fits a set of 
data. By definition, “most closely” means that the sum of the squares of 
the differences between the data points and the corresponding values of the 
linear function is minimum. More generally, one seeks the best polynomial 
fit to the data. 

We shall consider a related topic, namely least-square fitting of a given 
Junction with polynomials. Suppose f (x) is a function defined on (a, b). We 
want to find a polynomial that most closely approximates f. Write such a 
polynomial as p(x) = 79 a,x*, where the a;,’s are to be determined such 
that 


b 
Slap. arsvand= f LF) ap —ajx—--: nx" | dx 
a 


is aminimum. Differentiating S with respect to the a,’s and setting the result 
equal to zero gives 


0= as = [2 f(x) — at dx 
0aj a , 


k=0 


or 


is b . b . 
Ya | ae) f (x)x! dx. 
k=0 a a 
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One can rewrite this in matrix form as Ba = c, where a is a column vector 
with components a;, and B and c are a matrix and a column vector whose 
components are 

pitk+] _ qgitk+l 


Bie 
Kj eae 


b 
and oa, f(x)xidx. (7.15) 


By solving this matrix equation, one finds the ax’s, which in turn give the 
best fit. 

A drawback of the procedure above is that the desire for a higher-degree 
polynomial fit entails the implementation of the procedure from scratch and 
the solution of a completely new matrix equation. One way to overcome this 
difficulty is to use orthogonal polynomials. Then we would have 


b n 2 
Slay, a1.....4n) =f [ro Foaucven] w(x) dx, 
a k=0 


where we have introduced a weight function w(x) for convenience. The 
derivative equation becomes 


9S b n 
o= =f af-cy¢0]] 109 ~) C409 Ja, 
J a k=0 


or 


nt b b 
ya | Cj (x) Cex) w(x) dx = ij Cj (x) f (x) w(x) dx. 
kaQ Ee 
=0 unless j =k 
It follows that 


= ie Cj (x) f(x) w(x) dx 
; fPIC} @Pwx)dx | 


=0,1,...,n, (7.16) 


which is true regardless of the number of polynomials in the sum. Hence, 
once we find {a; Yio» we can add the (m + 1)st polynomial and determine 
dm+1 from Eq. (7.16) without altering the previous coefficients. 


Example 7.2.6 Let us find the least-square fit to f(x) = cos(577x) in the 
interval (—1, +1) using polynomials of second degree. First we use a single 
polynomial whose coefficients are determined by Eq. (7.15). We can easily 
calculate the column vector c: 
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The elements of the matrix B can also be calculated easily. To find the un- 
known a;’s, we need to solve 


2 4 

2 0 3)\ (ao iz 

0% O][a]}= 0 

g 0 a) NP). Neate 
The solution is 

24 6 72 
ag 5 a; =0, a=—--3 
a x © 


Therefore, 


1 _ 24 6 72\ 4 
cos ae aoe re = x. 


If we wish to use orthogonal polynomials with w(x) = 1, we can employ 
the polynomials found in Example 7.2.5. Then 


fi) Pie) cos(darx) dx 


aj=— . F=013, 
J [Pi QP dx 
which yields 
2 120 20 
a=, a, =0, ay=—-— + 
a sa 1 


7.3 Continuous Index 


Once we allow the number of dimensions to be infinite, we open the door 
for numerous possibilities that are not present in the finite case. One such 
possibility arises because of the variety of infinities. We have encountered 
two types of infinity in Chap. 1, the countable infinity and the uncountable 
infinity. The paradigm of the former is the “number” of integers, and that of 
the latter is the “number” of real numbers. The nature of dimensionality of 
the vector space is reflected in the components of a general vector, which has 
a finite number of components in a finite-dimensional vector space, a count- 
ably infinite number of components in an infinite-dimensional vector space 
with a countable basis, and an uncountably infinite number of components 
in an infinite-dimensional vector space with no countable basis. 

To gain an understanding of the nature of, and differences between, the 
three types of vector spaces mentioned above, it is convenient to think of 
components as functions of a “counting set”. Thus, the components f; of 
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a vector | f) in an N-dimensional vector space can be thought of as val- 
ues of a function f defined on the finite set {1,2,..., NM}, and to empha- 
size such functional dependence, we write f (i) instead of f;. Similarly, the 
components f; of a vector |) in a Hilbert space with the countable basis 
B = {|e;)}?°, can be thought of as values of a function f : N—> C, where N 
is the (infinite) set of natural numbers. The next step is to allow the counting 
set to be uncountable, i.e., a continuum such as the real numbers or an in- 
terval thereof. This leads to a “component” of the form f(x) corresponding 
to a function f : R— C. What about the vectors themselves? What sort of 
a basis gives rise to such components? 

Because of the isomorphism of Theorem 7.2.2, we shall concentrate on 
k2 (a, b). In keeping with our earlier notation, let {|e,)},eR be a set of vec- 
tors and interpret f(x) as (ex|f). The inner product of ee kas b) can now 
be written as 


b b 
wif = [ eo feow(s)dx= | (glex)(ex| f)w(x) dx 


b 
=tei(/ lex)wta)(exldx LP 


The last line suggests writing 


b 
[ leowontesldx=1. 
a 
In the physics literature the “e” is ignored, and one writes |x) for |ex). 
Hence, we obtain the completeness relation for a continuous index: 


b b 
[ wootiar=1, or [ wetae=1, (7.17) 


where in the second integral, w(x) is set equal to unity. We also have 


b b 
n=(f IswO)(alde Lf) = / f(xywx)ixydx, (7.18) 


which shows how to expand a vector | f) in terms of the |x)’s. 
Take the inner product of (7.18) with (x’| to obtain 


b 
elpeHfe)= / f (x)w(x) (x"|x) dx, 


where x’ is assumed to lie in the interval (a,b), otherwise f(x’) = 0 by 
definition. This equation, which holds for arbitrary /, tells us immediately 
that w(x)(x’|x) is no ordinary function of x and x’. For instance, sup- 
pose f(x’) = 0. Then, the result of integration is always zero, regardless 
of the behavior of f at other points. Clearly, there is an infinitude of func- 
tions that vanish at x’, yet all of them give the same integral! Pursuing this 
line of argument more quantitatively, one can show that w(x)(x’|x) = 0 
if x Ax’, w(x)(x|x) = 00, w(x)(x’|x) is an even function of x — x’, and 
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Fig. 7.2 The Gaussian bell-shaped curve approaches the Dirac delta function as the 
width of the curve approaches zero. The value of € is 1 for the dashed curve, 0.25 for 
the heavy curve and 0.05 for the light curve 


ie w(x)(x’|x) dx = 1. The proof is left as a problem. The reader may rec- 
ognize this as the Dirac delta function 


5(x = x’) = w(x) (x’ |x), (7.19) 


which, for a function f defined on the interval (a, b), has the following 
property:* 


b ! te 
/ i FR’) if x € (a,b), 
[ aaa if x’ ¢ (a,b). ay 


Written in the form (x’|x) = 6(x —x’)/w(x), Eq. (7.19) is the generalization 
of the orthonormality relation of vectors to the case of a continuous index. 

The Dirac delta function is anything but a “function”. Nevertheless, there 
is a well-developed branch of mathematics, called generalized function the- 
ory or functional analysis, studying it and many other functions like it in a 
highly rigorous fashion. We shall only briefly explore this territory of math- 
ematics in the next section. At this point we simply mention the fact that 
the Dirac delta function can be represented as the limit of certain sequences 
of ordinary functions. The following three examples illustrate some of these 
representations. 


Example 7.3.1 Consider a Gaussian curve whose width approaches zero at 
the same time that its height approaches infinity in such a way that its area 
remains constant. In the infinite limit, we obtain the Dirac delta function. In 
fact, we have 
1 1y2 
So) Slim =e, 
( ) e>0 ,/ET 

In the limit of ¢ — 0, the height of this Gaussian goes to infinity while its 
width goes to zero (see Fig. 7.2). Furthermore, for any nonzero value of e, 


“For an elementary discussion of the Dirac delta function with many examples of its 
application, see [Hass 08]. 
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Fig. 7.3. The function sin 7x/x also approaches the Dirac delta function as the width of 
the curve approaches zero. The value of T is 0.5 for the dashed curve, 2 for the heavy 
curve, and 15 for the light curve 


we can easily verify that 
1 1)2 
ce OV dx = 1 
e x=1. 
[. a/ ET 


This relation is independent of € and therefore still holds in the limit € — 0. 
The limit of the Gaussian behaves like the Dirac delta function. 


Example 7.3.2 Consider the function Dr (x — x’) defined as 


The integral is easily evaluated, with the result 


1 ei @—x')t 


) Tr 1 sinT(x — x’) 
x= = 2 
2m i(x — x’) 


nae 7p me x—x! 


The graph of Dr (x — 0) as a function of x for various values of T is shown 
in Fig. 7.3. Note that the width of the curve decreases as T increases. The 
area under the curve can be calculated: 


ioe) 1 © gin T(x — x’ 1 Og 
/ Dr(x—x')dx= 7 muri BO) ge / any ae 


/ 
=65 H Joo x—X HJio Y 


= 
Figure 7.3 shows that Dr(x — x’) becomes more and more like the Dirac 


delta function as T gets larger and larger. In fact, we have 


1 sinT (x — x’ 
5(x — x’) = lim pies *) (7.21) 
Toonm x—x’ 
To see this, we note that for any finite T we can write 


_ fsnT@—x’) 
or T(x — x’) 


Dr (x x’) 
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Fig. 7.4 The step function, or @-function, shown in the figure has the Dirac delta function 
as its derivative 


Furthermore, for values of x that are very close to x’, 


: / 
T(x—x’)>0 and SUD 2s 1. 
T(x — x’) 
Thus, for such values of x and x’, we have Dr (x — x’) © (T/z), which is 
large when T is large. This is as expected of a delta function: 6(0) = oo. On 
the other hand, the width of Dr(x — x’) around x’ is given, roughly, by the 
distance between the points at which Dr (x — x’) drops to zero: T (x — x’) = 
tx, or x — x’ =+7/T. This width is roughly Ax = 27/T, which goes to 
zero as T grows. Again, this is as expected of the delta function. 


The preceding example suggests another representation of the Dirac delta 
function: 


tf? ce 
8(x—x’) = = i; eh ge, (7.22) 
TJ —oo 


Example 7.3.3 A third representation of the Dirac delta function involves step function or 6 
the step function 6 (x — x’), which is defined as function 


1 ifx>x’ 


5 / 
ae—s)=f) ifx <x’, 


and is discontinuous at x = x’. We can approximate this step function by a 
variety of continuous functions. One such function is T. (x — x’) defined by 


0 if x <x’ —e, 
T.(x —x') = a(x — x’ +6) ifx’ —Ee<x<x'+e, 
1 ifx>x'+e, 


where € is a small positive number as shown in Fig. 7.4. It is clear that 


6(x — x’) = lim T.(x — 2’). 
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Now let us consider the derivative of T-(x — x’) with respect to x: 


0 ifx<x'-e, 


(x —x’) = 4 ifx’ -e<x<x'+e, 
0 ifx>x't+e. 


We note that the derivative is not defined at x = x’ — e and x = x’ +, and 
that dT. /dx is zero everywhere except when x lies in the interval (x’ — 
e, x’ + €), where it is equal to 1/(2€) and goes to infinity as « > 0. Here 
again we see signs of the delta function. In fact, we also note that 


maT. RTE aT: ae 
/ ( )ax= | ( )ax= [ —dx=1. 
—oo \ dx yi-e \ dx yi=e 2E 
It is not surprising, then, to find that lime_,9 Te (x — x’) =8(x — x’). As- 
suming that the interchange of the order of differentiation and the limiting 
process is justified, we obtain the important identity 
d 


qo _ a) = 5(x _ ae (7.23) 


Now that we have some understanding of one continuous index, we can 
generalize the results to several continuous indices. In the earlier discussion 
we looked at f(x) as the xth component of some abstract vector | f). For 
functions of n variables, we can think of f(x1,...,X,) as the component 
of an abstract vector | f) along a basis vector |x,,...,X,).° This basis is 
a direct generalization of one continuous index to n. Then f(x1,..., Xn) 
is defined as f(x1,...,Xn) = (%1,.--,4n|f). If the region of integration is 
denoted by 2, and we use the abbreviations 


r= (x1, X%2,...,Xn), d"x =dx\dx2...dXxn, 


|X1,X2,...,X,) = |r), 8(x1 —x})...8 (xn —x)) =6(r—r), 


then we can write 


ne [ dxf (ewin)ln), | d"x|p)w(e) (| =1, 
- " (7.24) 


f(r)= [arsreowon' in (r'|[r)w(r) = 5(r—-r), 


where d”x is the “volume” element and {2 is the region of integration of 
interest. 

For instance, if the region of definition of the functions under considera- 
tion is the surface of the unit sphere, then [with w(r) = 1], one gets 


20 TU 
| ao | sin d0|0, ¢) (0, o| =1. (7.25) 
0 0 


5Do not confuse this with an n-dimensional vector. In fact, the dimension is n-fold infi- 
nite: each x; counts one infinite set of numbers! 
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This will be used in our discussion of spherical harmonics in Chap. 13. 

An important identity using the three-dimensional Dirac delta function 
comes from potential theory. This is (see [Hass 08] for a discussion of this 
equation) 


v-(—_)= 4n5(r—r’). (7.26) 


7.4 Generalized Functions 


Paul Adrian Maurice Dirac discovered the delta function in the late 1920s 
while investigating scattering problems in quantum mechanics. This “func- 
tion” seemed to violate most properties of other functions known to mathe- 
maticians at the time. However, later, when mathematicians found a rigorous 
way of studying this and other functions similar to it, a new vista in higher 
mathematics opened up. 

The derivative of the delta function, 5’(x — x’) is such that for any ordi- 
nary function f(x), 


[. fase ardx=— f f' (d(x — a) dx =— f'(a). 


We can define 5’(x — a) by this relation. In addition, we can define the 
derivative of any function, including discontinuous functions, at any point 
(including points of discontinuity, where the usual definition of derivative 
fails) by this relation. That is, if g(x) is a “bad” function whose derivative is 
not defined at some point(s), and f(x) is a “good” function, we can define 
the derivative of g(x) by 


/ fergaydx=— [ f' (x)p(x) dx. 


The integral on the RHS is well-defined. 

Functions such as the Dirac delta function and its derivatives of all orders 
are not functions in the traditional sense. What is common among all of them 
is that in most applications they appear inside an integral, and we saw in 
Chap. 2 that integration can be considered as a linear functional on the space 
of continuous functions. It is therefore natural to describe such functions in 
terms of linear functionals. This idea was picked up by Laurent Schwartz 
in the 1950s who developed it into a new branch of mathematics called 
generalized functions, or distributions. 

A distribution is a mathematical entity that appears inside an integral in 
conjunction with a well-behaved test function—which we assume to de- 
pend on n variables—such that the result of integration is a well-defined 
number. Depending on the type of test function used, different kinds of dis- 
tributions can be defined. If we want to include the Dirac delta function and 
its derivatives of all orders, then the test functions must be infinitely differ- 
entiable, that is, they must be C® functions on R” (or C”). Moreover, in 


test function 
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order for the theory of distributions to be mathematically feasible, all the 
test functions must be of compact support, i.e., they must vanish outside 
a finite “volume” of R” (or C”). One common notation for such functions 
is CR") or CH (C") (F stands for “finite”). The definitive property of 
distributions concerns the way they combine with test functions to give a 
number. The test functions used clearly form a vector space over R or C. In 
this vector-space language, distributions are linear functionals. The linearity 
is a simple consequence of the properties of the integral. We therefore have 
the following definition of a distribution. 

Definition 7.4.1 A distribution, or generalized function, is a continuous® 
linear functional on the space C#°(IR”) or CH (C"). If f € CF and gy isa 
distribution, then g[ f] = tie g(r) f(r) d"x. 


Another notation used in place of g[ f] is (g, f). This is more appealing 
not only because ¢ is linear, in the sense that g[af + 6g] = ag[f]+ Bolg], 
but also because the set of all such linear functionals forms a vector space; 
that is, the linear combination of the g’s is also defined. Thus, (gy, f) sug- 
gests a mutual “democracy” for both f’s and g’s. 

We now have a shorthand way of writing integrals. For instance, if 5, 
represents the Dirac delta function 6(x — a), with an integration over x un- 
derstood, then (5g, f) = f(a). Similarly, (57, f) = —f'(a), and for linear 
combinations, (a@5q + B6),, f) =af (a) — Bf'(a). 


Example 7.4.2 An ordinary (continuous) function g can be thought of as 
a special case of a distribution. The linear functional g : C?°(R) > R is 


simply defined by (g, f) = gl f] = f°, g) f(x) dx. 


Example 7.4.3 An interesting application of distributions (generalized 
functions) occurs when the notion of density is generalized to include not 
only (smooth) volume densities, but also point-like, linear, and surface den- 
sities. 

A point charge g located at ro can be thought of as having a charge 
density o(r) = gd(r—Yro). In the language of linear functionals, we interpret 
p as a distribution, p : C$ (R*) > R, which for an arbitrary function f gives 


PLAI=(e, f) =9f Go). (7.27) 


The delta function character of o can be detected from this equation by 
recalling that the LHS is 


N 
[owmseas= lim Do ptf OAV. 


AV; > 0 i=1 


®See [Zeid 95, pp. 27, 156-160] for a formal definition of continuity for linear function- 
als. 


7.4 Generalized Functions 


On the RHS of this equation, the only volume element that contributes is 
the one that contains the point ro; all the rest contribute zero. As AV; — 0, 
the only way that the RHS can give a nonzero number is for p(ro) f (ro) 
to be infinite. Since f is a well-behaved function, o(ro) must be infinite, 
implying that p(r) acts as a delta function. This shows that the definition of 
Eq. (7.27) leads to a delta-function behavior for p. Similarly for linear and 
surface densities. 


The example above and Problems 7.12 and 7.13 suggest that a distribu- 
tion that confines an integral to a lower-dimensional space must have a delta 
function in its definition. 

We have seen that the delta function can be thought of as the limit of an 
ordinary function. This idea can be generalized. 


Definition 7.4.4 Let {g,(x)} be a sequence of functions such that 


(jim. f. Gn(x) f(x) dx 


exists for all f € Cf (IR). Then the sequence is said to converge to the dis- 
tribution g, defined by 


[o,@) 
(o.f)=,lim, [moo fordx vf 
NO Jog 
This convergence is denoted by g, —> 9. 


For example, it can be verified that 


No _y2,2 1 —cosnx 
—e "* —+ d(x) and 


J nx? 


and so on. The proofs are left as exercises. 


—> d(x) 


Historical Notes 

“Physical Laws should have mathematical beauty.” This statement was Dirac’s response 
to the question of his philosophy of physics, posed to him in Moscow in 1955. He wrote 
it on a blackboard that is still preserved today. 

Paul Adrien Maurice Dirac (1902-1984), was born in 1902 in Bristol, England, of a 
Swiss, French-speaking father and an English mother. His father, a taciturn man who 
refused to receive friends at home, enforced young Paul’s silence by requiring that only 
French be spoken at the dinner table. Perhaps this explains Dirac’s later disinclination 
toward collaboration and his general tendency to be a loner in most aspects of his life. 
The fundamental nature of his work made the involvement of students difficult, so perhaps 
Dirac’s personality was well-suited to his extraordinary accomplishments. 

Dirac went to Merchant Venturer’s School, the public school where his father taught 
French, and while there displayed great mathematical abilities. Upon graduation, he fol- 
lowed in his older brother’s footsteps and went to Bristol University to study electrical 
engineering. He was 19 when he graduated Bristol University in 1921. Unable to find 
a suitable engineering position due to the economic recession that gripped post-World 
War I England, Dirac accepted a fellowship to study mathematics at Bristol University. 
This fellowship, together with a grant from the Department of Scientific and Industrial 
Research, made it possible for Dirac to go to Cambridge as a research student in 1923. 
At Cambridge Dirac was exposed to the experimental activities of the Cavendish Labora- 
tory, and he became a member of the intellectual circle over which Rutherford and Fowler 
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Paul Adrien Maurice 
Dirac 1902-1984 
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“The amount of 
theoretical ground one 
has to cover before 
being able to solve 
problems of real 
practical value is rather 
large, but this 
circumstance is an 
inevitable consequence 
of the fundamental part 
played by 
transformation theory 
and is likely to become 
more pronounced in the 
theoretical physics of the 
future.’ P.A.M. Dirac 
(1930) 


derivative of a 
distribution 


7 Hilbert Spaces 


presided. He took his Ph.D. in 1926 and was elected in 1927 as a fellow. His appointment 
as university lecturer came in 1929. He assumed the Lucasian professorship following 
Joseph Larmor in 1932 and retired from it in 1969. Two years later he accepted a position 
at Florida State University where he lived out his remaining years. The FSU library now 
carries his name. 

In the late 1920s the relentless march of ideas and discoveries had carried physics to 
a generally accepted relativistic theory of the electron. Dirac, however, was dissatisfied 
with the prevailing ideas and, somewhat in isolation, sought for a better formulation. By 
1928 he succeeded in finding an equation, the Dirac equation, that accorded with his own 
ideas and also fit most of the established principles of the time. Ultimately, this equation, 
and the physical theory behind it, proved to be one of the great intellectual achievements 
of the period. It was particularly remarkable for the internal beauty of its mathemati- 
cal structure, which not only clarified previously mysterious phenomena such as spin 
and the Fermi-Dirac statistics associated with it, but also predicted the existence of an 
electron-like particle of negative energy, the antielectron, or positron, and, more recently, 
it has come to play a role of great importance in modern mathematics, particularly in the 
interrelations between topology, geometry, and analysis. Heisenberg characterized the 
discovery of antimatter by Dirac as “the most decisive discovery in connection with the 
properties or the nature of elementary particles.... This discovery of particles and an- 
tiparticles by Dirac ... changed our whole outlook on atomic physics completely.” One 
of the interesting implications of his work that predicted the positron was the prediction 
of a magnetic monopole. Dirac won the Nobel Prize in 1933 for this work. 

Dirac is not only one of the chief authors of quantum mechanics, but he is also the cre- 
ator of quantum electrodynamics and one of the principal architects of quantum field 
theory. While studying the scattering theory of quantum particles, he invented the (Dirac) 
delta function; in his attempt at quantizing the general theory of relativity, he founded 
constrained Hamiltonian dynamics, which is one of the most active areas of theoretical 
physics research today. One of his greatest contributions is the invention of bra (| and 
ket |). 

While at Cambridge, Dirac did not accept many research students. Those who worked 
with him generally thought that he was a good supervisor, but one who did not spend 
much time with his students. A student needed to be extremely independent to work 
under Dirac. One such student was Dennis Sciama, who later became the supervisor of 
Stephen Hawking, the current holder of the Lucasian chair. Salam and Wigner, in their 
preface to the Festschrift that honors Dirac on his seventieth birthday and commemorates 
his contributions to quantum mechanics succinctly assessed the man: 


Dirac is one of the chief creators of quantum mechanics.... Posterity will rate 
Dirac as one of the greatest physicists of all time. The present generation values 
him as one of its greatest teachers. ... On those privileged to know him, Dirac has 
left his mark ... by his human greatness. He is modest, affectionate, and sets the 
highest possible standards of personal and scientific integrity. He is a legend in his 
own lifetime and rightly so. 


(Taken from Schweber, S.S. “Some chapters for a history of quantum field theory: 1938- 
1952”, in Relativity, Groups, and Topology II vol. 2, B.S. DeWitt and R. Stora, eds., 
North-Holland, Amsterdam, 1984.) 


Definition 7.4.5 The derivative of a distribution g is another distribution 
yg’ defined by (g', f) =—(g, f) Vf ECP. 


Example 7.4.6 We can combine the last two definitions to show that if the 
functions 6, are defined as 


0 ifx<—t, 
On(x) =) (x+)/2 if-P<x<F, 
1 ifx>+ 


7.5 Problems 


then 6/ (x) > d(x). 


We write the definition of the derivative, (01, 


f)=—(On, f’), in terms of 


integrals: 
/ 0, (x) f (x) dx 
oe) df oo 
--| ano Fax =— | On(x) af 
—oo dx —0o 


=] /x 1/n oo 
=-(f “omar [" oooar+ [” coar) 
—0o =L[/n 1/n 
1/n oo 
=-(0+ f mars | ar) 
=l/n 1/n 
n 1/n 4 1 1/n r om 
=-3h2 3. aa ‘ 
1/n 
=~ («reals - / fox) 
2 =l/n aie 


- 5(f/n) — f (-1/n)) — foo) + f(1/n). 


For large n, we have 1/n ~O and f(+1/n) © f (0). Thus, 


oo 1 1 1 1 2 
/ 0 (x) f(x) dx © a s( )+ s( ) F(0)) + FO) 
ae n n n n n 


~ f(0). 


The approximation becomes equality in the limit n — oo. Thus, 
[o,@) 
lim a 6) (x) f(x) dx = f(0) = (60, f) > O86. 
N>0O Joo 


Note that f(0o) = 0 because of the assumption that all functions must van- 
ish outside a finite volume. 


7.5 Problems 


7.1 Show that |||a|| — |||] < lla +b] < lla] + [A 1. 
7.2 Show that a convergent sequence is necessarily Cauchy. 


7.3 Verify that the sequence of functions { f,(«)} defined in Example 7.1.5 
is a Cauchy sequence. 


7.4 Prove the completeness of C, using the completeness of R. 
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7.5 Let £!(R) be the set of all functions f such that || f || = fined | f (x)| dx 
is finite. This is clearly a normed vector space. Let f and g be nonzero 
functions such that at no x are f(x) and g(x) both nonzero. Verify that 


(a) Ilf+sl=llfll+ lll. 

(bo) Iftel? +f — gl? =20/ f+ lei. 

(c) Using parts (a), (b), and Theorem 2.2.9, show that £'(R) is not an 
inner product space. 


This construction shows that not all norms arise from an inner product. 


7.6 Use Eq. (7.10) to derive Eq. (7.12). Hint: To find a,, equate the coeffi- 


cients of x” on both sides of Eq. (7.10). To find a,_;, multiply both sides of 


Eq. (7.10) by C,_1 w(x) and integrate, using the definitions of Mm, ae . 


and hy. 
7.7 Evaluate the integral f° X?Cm(x)Cn (x) w(x) dx. 


7.8 Write a density function for two point charges qi and q2 located at 
r=ry, and r= ry, respectively. 


7.9 Write a density function for four point charges gq) = 4, g2 = —4,93 =4 
and q4 = —q, located at the corners of a square of side 2a, lying in the xy- 
plane, whose center is at the origin and whose first corner is at (a, a). 


7.10 Show that 6(f(x)) = uo — xo), where xo is aroot of f and x is 


confined to values close to x9. Hint: Make a change of variable to y = f(x). 


7.11 Show that 


m 


1 
3(f (x)) _ > Pople — Xk), 


k=1 


where the x;’s are all the roots of f in the interval on which f is defined. 


7.12 Define the distribution p : C(R*) > R by 
(o,f) = [corm da(r), 


where o(r) is a smooth function on a smooth surface S in R?. Show that 
p(r) is zero if ris not on S and infinite if ris on S. 


7.13 Define the distribution p : C°(R*) > R by 
(o,f) = ; A(r) f(r) de(r), 


where A(r) is a smooth function on a smooth curve C in R?. Show that p(r) 
is zero if r is not on C and infinite if r is on C. 


7.5 Problems 


7.14 Express the three-dimensional Dirac delta function as a product of 
three one-dimensional delta functions involving the coordinates in 


(a) cylindrical coordinates, 
(b) spherical coordinates, 
(c) general curvilinear coordinates. 


Hint: The Dirac delta function in R? satisfies [ff 5(r)d3x = 1. 
7.15 Show that [%, 8’(x) f(x) dx = — f’() where 5/(x) = £6(x). 


7.16 Evaluate the following integrals: 


(a) im 5(x? — 5x + 6) (3x7 — 7x +2) dx. 


(b) [- 5(x? — 27) cosx dx. 


[o,@) 2 x 
(c) / 6(sin (5) dx. 
0.5 3 
2g 2 
(d) / d(e* )Inx dx. 
Hint: Use the result of Problem 7.11. 
7.17 Consider |x| as a generalized function and find its derivative. 
7.18 Let 7 € C™°(R") be a smooth function on R”, and let g be a distribu- 
tion. Show that 7¢ is also a distribution. What is the natural definition for 


no? What is (n@)’, the derivative of np? 


7.19 Show that each of the following sequences of functions approaches 
6(x) in the sense of Definition 7.4.4. 


(a) ao . 
1 —cosnx 

Eo 

TNX 

(c) n 1 

c) — : 
mw 1+n2x2 
sinnx 

(d) 

UX 


Hint: Approximate g(x) for large n and x ~ 0, and then evaluate the ap- 
propriate integral. 


7.20 Show that 5(1 + tanhnx) > 6(x) as n > ov. 


7.21 Show that x8’ (x) = —8(x). 
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Classical Orthogonal Polynomials 


Example 7.2.5 discussed only one of the many types of the so-called clas- 
sical orthogonal polynomials. Historically, these polynomials were discov- 
ered as solutions to differential equations arising in various physical prob- 
lems. Such polynomials can be produced by starting with 1, x,x?,... and 
employing the Gram-Schmidt process. However, there is a more elegant, al- 
beit less general, approach that simultaneously studies most polynomials of 
interest to physicists. We will employ this approach.! 


8.1 General Properties 
Most relevant properties of the polynomials of interest are contained in 


Theorem 8.1.1 Consider the functions 


n 


lod") 
nG) =n age ) forn=0,1,2,..., (8.1) 


where 


1. F{(x) is a first-degree polynomial in x, 

2. s(x) is a polynomial in x of degree less than or equal to 2 with only 
real roots, 

3. w(x) is a strictly positive function, integrable in the interval (a, b), that 
satisfies the boundary conditions w(a)s(a) =0 = w(b)s(b). 


Then F,(x) is a polynomial of degree n in x and is orthogonal—on the 
interval (a, b), with weight w(x)—to any polynomial p;(x) of degree k <n, 
1.é., 


b 
/ De(x) Fux(x) w(x) dx =0 fork <n. 


These polynomials are collectively called classical orthogonal polynomials. 


' This approach is due to KG. Tricomi [Tric 55]. See also [Denn 67]. 
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8 Classical Orthogonal Polynomials 


Before proving the theorem, we need two lemmas: 


Lemma 8.1.2 The following identity holds: 


d™ _ 
Fam ws" Pst) =ws" "Dekim, m<n. 


Proof See Problem 8.1. 


Lemma 8.1.3 All the derivatives d™ /dx™ (ws") vanish at x =a and x = b, 
for all values of m <n. 


Proof Set k = 0 in the identity of the previous lemma and let p<o = 1. Then 


mm . 
we have far (ws") = ws"—" p<m. The RHS vanishes at x = a and x =b 


due to the third condition stated in the theorem. 


Proof of the theorem We prove the orthogonality first. The proof involves 
multiple use of integration by parts: 


b : cu 
[ pax) Fa(aywcs)dx = f peor] £5 (ws") foo ds 


n 
a a w| dx 


———— 
=0 by Lemma 8.1.3 


> dp q'-} 
a ax dx! 


(ws”) dx. 


This shows that each integration by parts transfers one differentiation from 
ws” to px and introduces a minus sign. Thus, after & integrations by parts, 
we get 


b b yk n—k 
d* pr d 
[ recorcowenas= 0k [SA (ws) ax 
b n—k—1 
d\d 
=¢ [5] gee") |e 
q'-k-l b 
=C__(ws")| =0, 
dxt—k-1 (ws”) i 


where we have used the fact that the kth derivative of a polynomial of de- 
gree k is a constant. Note that n — k — 1 > 0 because k <n, so that the 
last line of the equation is well-defined. The last equality follows from 
Lemma 8.1.3. 


Recall that p<, is a generic polynomial with degree less than or equal to k. 
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To prove the first part of the theorem, we use Lemma 8.1.2 with k = 0, 
P<0 = po= 1, and m =n to get 


ae [ 
Fan ws") = Wn, or F(x) ==; 


(ws”) = P<n. 


To prove that F;,(x) is a polynomial of degree precisely equal to n, we write 
Fy(x) = pen—1(x) + ke x", multiply both sides by w(x) Fn(x), and inte- 
grate over (a, b): 


b 
[ (efor 
b b 
= pen-1Fr(xdw(ayds + iQ [ x” Fy, (x)w(x) dx. 


The LHS is a positive quantity because both w(x) and [F, (x)]? are pos- 
itive, and the first integral on the RHS vanishes by the first part of the 
proof. Therefore, the second term on the RHS cannot be zero. In particu- 
lar, A #0, and F;,(x) is of degree n. 


It is customary to introduce a normalization constant in the definition of 
F,, (x), and write 


Fy (x) = 


Gude (ws ). (8.2) 


This equation is called the generalized Rodriguez formula. For historical ; : 
generalized Rodriguez 


reasons, different polynomial functions are normalized differently, which is formula 


why K,, is introduced here. 

From Theorem 8.1.1 it is clear that the sequence {F;, (63) eae forms an 
orthogonal set of polynomials on [a, b] with weight function w(x). 

All the varieties of classical orthogonal polynomials were discovered as__ differential equation for 
solutions of differential equations. Here, we give a single generic differential classical orthogonal 
equation satisfied by all the F,,’s. The proof is outlined in Problem 8.4. polynomials 


Proposition 8.1.4 Let ie be the coefficient of x in F\(x) and o2 the coef- 
ficient of x? in s(x). Then the orthogonal polynomials Fn, satisfy the differ- 
ential equation 


d dFy, () 
—|us = WAnFn(x) where Ay = Kik; n+o2n(n — 1). 
dx dx 


We shall study the differential equation above in the context of the Sturm- 
Liouville problem (see Chap. 19), which is an eigenvalue problem involving 
differential operators. 
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Let us now investigate the consequences of various choices of s(x). We start 
with F(x), and note that it satisfies Eq. (8.2) with n = 1: 


F(x) 1 ay ) 1 a, ) K, F\(x) 
=————(ws), or —— = ——__, 
ee Kiw dx a ws dx it KY 


which can be integrated to yield ws = Aexp({ Ki Fi(x)dx/s) where A is 
a constant. On the other hand, being a polynomial of degree 1, F) (x) can be 
written as Fy (x) = KO x + k. It follows that 


() (0) 
w(x)s(x) = Aexo( ae av), 


w(a)s(a) =0= w(b)s(b). 


(8.3) 


Next we look at the three choices for s(x): a constant, a polynomial of 
degree 1, and a polynomial of degree 2. For a constant s(x), Eq. (8.3) can 
be easily integrated: 


a, ,O) 
w(x)s(x) = Acso( Bite PEE ax) = Aeso( [ax n pas) 


S 


_ Aetx + Bx+€ = Bet +Bx 
2a = Kk Js, p= Kk Js, B=Ac°. 
The interval (a, b) is determined by w(a)s(a) = 0 = w(b)s(b), which yields 
Bete +Ba _ 0= Bet? +Bb, 
For nonzero B, the only way that this equality can hold is for @ to be negative 
and for a and b to be infinite. Since a < b, we must take a = —oo and 
b=-+o0. With y = /|a|(x + B/(2a)) and choosing B = s exp(B*/(4a)), 
we obtain w(y) = exp(—y*). We also take the constant s to be 1. This is 


always possible by a proper choice of constants such as B. 
If the degree of s is 1, then s(x) = 01x + 09 and 


(1) (0) 

Ki(k +k 

1¢ 1 * 1 ax) 
01x + 09 


Kuk? Ky kO — Kk oo /o 
=ae| | ( a + a EciELUs +) ax| 
onl 01x +00 


= B(o\x + 00)’e"*, 


w(x)(o,x +09) = Aexo( / 


where y = Kik\? /oy, p = Kik\ /oy — Kyk\ 09/02, and B is A modi- 
fied by the constant of integration. The last equation above must satisfy the 
boundary conditions at a and b: 


B(oja +00)’ e”* =0= B(oib +00)’ e”?, 


8.3. Recurrence Relations 


Table 8.1 Special cases of Jacobi polynomials 


yw v w(x) Polynomial 

0 0 1 Legendre, P,, (x) 

A= 5 A 5 (1 —x?)4-1/2 Gegenbauer, Ch(x), A> -5 

-5 5 (1 —x?)7!/2 Chebyshev of the first kind, T;, (x) 

5 5 (1 —x?)!/2 Chebyshev of the second kind, U;, (x) 


which give a = —o09/o1, p > 0, y < 0, and b = +00. With appropriate re- 
definition of variables and parameters, we can write 


wiy)=yre*, v>-l, s(x)=x, a=0, b=+00. 


Similarly, we can obtain the weight function and the interval of integra- 
tion for the case when s(x) is of degree 2. This result, as well as the results 
obtained above, are collected in the following proposition. 


Proposition 8.2.1 [f the conditions of Theorem 8.1.1 prevail, then 


(a) For s(x) of degree zero we get w(x) = e-* with S(x)=l,a=-co, 
and b = +00. The resulting polynomials are called Hermite polyno- 
mials and are denoted by H,(x). 

(b) For s(x) of degree 1, we obtain w(x) = x"e* with v > —1, s(x) =x, 
a=0, and b= +00. The resulting polynomials are called Laguerre 
polynomials and are denoted by L}}(x). 

(c) For s(x) of degree 2, we get w(x) = (1+x)4(1_— x)” with u,v >—-l, 

2 


s(x) =1-—x*,a=-—1, and b=-+1. The resulting polynomials are 


called Jacobi polynomials and are denoted by P}'*’ (x). 


Jacobi polynomials are themselves divided into other subcategories de- 
pending on the values of yz and v. The most common and widely used of 
these are collected in Table 8.1. Note that the definition of each of the pre- 
ceding polynomials involves a “standardization,” which boils down to a par- 
ticular choice of K, in the generalized Rodriguez formula. 


8.3 Recurrence Relations 


Besides the recurrence relations obtained in Sect. 7.2, we can use the differ- 
ential equation of Proposition 8.1.4 to construct new recurrence relations in- 
volving derivatives. These relations apply only to classical orthogonal poly- 
nomials, and not to general ones. We start with Eq. (7.12) 


Fri (x) = (nx + Bn) Fn) + Yn Fn-1), (8.4) 
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Karl Gustav Jacob Jacobi 
1804-1851 
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differentiate both sides twice, and substitute for the second derivative from 
the differential equation of Proposition 8.1.4. This will yield 


d 
2Wsan a + E ae + WAn (nx + An) | Fy 
— Wang Png + WYnAn—1Fn-1 = 9. (8.5) 


Historical Notes 

Karl Gustav Jacob Jacobi (1804-1851) was the second son born to a well-to-do Jewish 
banking family in Potsdam. An obviously bright young man, Jacobi was soon moved 
to the highest class in spite of his youth and remained at the gymnasium for four years 
only because he could not enter the university until he was sixteen. He excelled at the 
University of Berlin in all the classical subjects as well as mathematical studies, the topic 
he soon chose as his career. He passed the examination to become a secondary school 
teacher, then later the examination that allowed university teaching, and joined the faculty 
at Berlin at the age of twenty. Since promotion there appeared unlikely, he moved in 1826 
to the University of K6nigsberg in search of a more permanent position. He was known as 
a lively and creative lecturer who often injected his latest research topics into the lectures. 
He began what is now a common practice at most universities—the research seminar—for 
the most advanced students and his faculty collaborators. The Jacobi “school”, together 
with the influence of Bessel and Neumann (also at K6nigsberg), sparked a renewal of 
mathematical excellence in Germany. 

In 1843 Jacobi fell gravely ill with diabetes. After seeing his condition, Dirichlet, with 
the help of von Humboldt, secured a donation to enable Jacobi to spend several months 
in Italy, a therapy recommended by his doctor. The friendly atmosphere and healthful cli- 
mate there soon improved his condition. Jacobi was later given royal permission to move 
from KGnigsberg to Berlin so that his health would not be affected by the harsh winters 
in the former location. A salary bonus given to Jacobi to offset the higher cost of living in 
the capital was revoked after he made some politically sensitive remarks in an impromptu 
speech. A permanent position at Berlin was also refused, and the reduced salary and lack 
of security caused considerable hardship for Jacobi and his family. Only after he accepted 
a position in Vienna did the Prussian government recognize the desirability of keeping 
the distinguished mathematician within its borders, offering him special concessions that 
together with his love for his homeland convinced Jacobi to stay. In 1851 Jacobi died after 
contracting both influenza and smallpox. 

Jacobi’s mathematical reputation began largely with his heated competition with Abel in 
the study of elliptic functions. Legendre, formerly the star of such studies, wrote Jacobi 
of his happiness at having “lived long enough to witness these magnanimous contests 
between two young athletes equally strong”. Although Jacobi and Abel could reasonably 
be considered contemporary researchers who arrived at many of the same results inde- 
pendently, Jacobi suggested the names “Abelian functions” and “Abelian theorem” in a 
review he wrote for Crelle’s Journal. Jacobi also extended his discoveries in elliptic func- 
tions to number theory and the theory of integration. He also worked in other areas of 
number theory, such as the theory of quadratic forms and the representation of integers as 
sums of squares and cubes. He presented the well-known Jacobian, or functional deter- 
minant, in 1841. To physicists, Jacobi is probably best known for his work in dynamics 
with the form introduced by Hamilton. Although elegant and quite general, Hamiltonian 
dynamics did not lend itself to easy solution of many practical problems in mechanics. In 
the spirit of Lagrange, Poisson, and others, Jacobi investigated transformations of Hamil- 
ton’s equations that preserved their canonical nature (loosely speaking, that preserved the 
Poisson brackets in each representation). After much work and a little simplification, the 
resulting equations of motion, now known as Hamilton-Jacobi equations, allowed Jacobi 
to solve several important problems in ordinary and celestial mechanics. Clebsch and 
later Helmholtz amplified their use in other areas of physics. 


8.3. Recurrence Relations 


We can get another recurrence relation involving derivatives by substitut- 
ing (8.4) in (8.5) and simplifying: 


d 
2wsay F/ a nC + wan — An+1) (nx + p.)| Fy 


+ WYn(An—1 — Anti) Fn-1 = 0. (8.6) 


Two other recurrence relations can be obtained by differentiating equa- 
tions (8.6) and (8.5), respectively, and using the differential equation for F;,. 
Now solve the first equation so obtained for y,(d/dx)(w Fy,_1) and substi- 
tute the result in the second equation. After simplification, the result will 
be 


d d 
2WanAn Fy + an (ws) + W(An — An—1) (nx + Bn) | Fn 
dx dx 


d 
+ Ona — Ant) 2 wnt) =O. (8.7) 
Finally, we record one more useful recurrence relation: 


dw dw 
An(X) Fn — Anti (nx + Bu) Fn + YnAn—1 (nx + Bu) Fn—1 


ar Br (x) Fs 4 + Y¥nDn(x)Fy_ =0, (8.8) 


where 


d* dw 
An(X) = (nx + Bn)| 2WotpAn + On 7a (ws) + dn (Qnx + Bn) 


4d 


— ay 7x e™ 


Bn(x) = tH) — w(Onx + Bn)(Anti — An), 
dx 


Dy (x) = w(anx + Bn) (An—1 — An) — ee Ge: 
dx 


Details of the derivation of this relation are left for the reader. All these 
recurrence relations seem to be very complicated. However, complexity is 
the price we pay for generality. When we work with specific orthogonal 
polynomials, the equations simplify considerably. For instance, for Hermite 
and Legendre polynomials Eq. (8.6) yields, respectively, 


Hj) =2nHn-1, and (1—x?)P) +nxP, —nP,-1 =0. (8.9) 
Also, applying Eq. (8.7) to Legendre polynomials gives 


P’ 


et oP, = + 1) P=, (8.10) 


and Eq. (8.8) yields 


Pi, — Pi_y—(Qn+1)P, =0. (8.11) 


n 
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It is possible to find many more recurrence relations by manipulating the 
existing recurrence relations. 

Before studying specific orthogonal polynomials, let us pause for a mo- 
ment to appreciate the generality and elegance of the preceding discussion. 
With a few assumptions and a single defining equation we have severely 
restricted the choice of the weight function and with it the choice of the 
interval (a, b). We have nevertheless exhausted the list of the so-called clas- 
sical orthogonal polynomials. 


8.4 Details of Specific Examples 


We now construct the specific polynomials used frequently in physics. We 
have seen that the four parameters K,,, kK, RY , and h,, determine all the 
properties of the polynomials. Once K,, is fixed by some standardization, 
we can determine all the other parameters: Paw and mae will be given by 


the generalized Rodriguez formula, and h, can be calculated as follows: 


b b 
m= | FaGyw(aydx = | (KO x" +++) Fy (x) w(x) dx 


b n (n) b n—1 
1 od k, djd 
aay wx" (ws”) dx=— / x” (ws”) dx 
A K,w dx" Kn Ja dx| dx"! 


b aaa b d q'-} 
; c [ ax (<") as (ws”) dx. 


The first term of the last line is zero by Lemma 8.1.3. It is clear that 
each integration by parts introduces a minus sign and shifts one differen- 
tiation from ws” to x”. Thus, after n integrations by parts and noting that 
d° /dx°(ws") = ws" and d” /dx"(x") =n!, we obtain 


-] np) ] b 
—_ aa | ws" dx. (8.12) 
n a 


8.4.1 Hermite Polynomials 


The Hermite polynomials are standardized such that K, = (—1)”. Thus, the 
generalized Rodriguez formula (8.2) and Proposition 8.2.1 give 


Hy (x) = (-D)"e™ = (e~*). (8.13) 


It is clear that each time e~* is differentiated, a factor of —2x is intro- 


duced. The highest power of x is obtained when we differentiate e* n 
times. This yields (—1)"e* (—2x)"e-*" =2"'x"” => KM) = 2", 

To obtain Ko), we find it helpful to see whether the polynomial is 
even or odd. We substitute —x for x in Eq. (8.13) and get H,(—x) = 
(—1)”" A, (x), which shows that if 1 is even (odd), Hy is an even (odd) poly- 


nomial, i.e., it can have only even (odd) powers of x. In either case, the 
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next-highest power of x in H,(x) is not n — 1 but n — 2. Thus, the coef- 
ficient of x”—! is zero for Hy, (x), and we have ae = 0. For hy;, we use 
(8.12) to obtain hy = ./m 2"n!. 

Next we calculate the recurrence relation of Eq. (7.12). We can readily 
calculate the constants needed: a, = 2, By = 0, y, = —2n. Then substitute 


these in Eq. (7.12) to obtain 


An+1 (x) = 2x Ay (x) — 2n Hy-1 (x). (8.14) 


Other recurrence relations can be obtained similarly. 

Finally, the differential equation of H, (x) is obtained by first noting that 
K, =-1, 0 =0, F(x) = 2x > kt? =2. All of this gives Ay = —2n, 
which can be used in the equation of Proposition 8.1.4 to get 


+ 2nH, =0. (8.15) 
Xx 


8.4.2 Laguerre Polynomials 


For Laguerre polynomials, the standardization is K, =n!. Thus, the gener- summary of properties 
alized Rodriguez formula (8.2) and Proposition 8.2.1 give of Laguerre polynomials 


a" 1 a" 
ee ce (ute " a") = —x "er (ae Ne: *), (8.16) 
nix’e x 


v — =, 
LW n! dx" 


To find © we note that differentiating e~* does not introduce any new 
powers of x but only a factor of —1. Thus, the highest power of x is obtained 
by leaving x”*” alone and differentiating e~* n times. This gives 


J mgt tyyte* = (=D7 oi a Ko = elt 
n! n! n! 

We may try to check the evenness or oddness of L} (x); however, this will 
not be helpful because changing x to —x distorts the RHS of Eq. (8.16). 
In fact, Re-d # 0 in this case, and it can be calculated by noticing that 
the next-highest power of x is obtained by adding the first derivative of 
x"+” » times and multiplying the result by (—1)"~!, which comes from 
differentiating e~*. We obtain 


_ CD Oty) nt 


1 
ax Pe [CD in(n + wate > @—D! , 


and therefore k("~? = (—1)""!(n + v)/(a— DI. 
Finally, for hy, we get 


~1)"[(-1)"/niJn!_ °° Lf 
pe’ Sen - reads — | gle dx, 
0 n! Jo 


n! 


If v is not an integer (and it need not be), the integral on the RHS cannot be 
evaluated by elementary methods. In fact, this integral occurs so frequently 
in mathematical applications that it is given a special name, the gamma 
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function. A detailed discussion of this function can be found in Chap. 12. 
At this point, we simply note that 


Co 
retp= | xte* dx, Tin+l=n! forneN, (8.17) 
0 


and write hy, as 


i _Tatv4+l)_ Tatv+) 
aaa n! ~ Piet 


The relevant parameters for the recurrence relation can be easily calculated: 


1 _2n+v+l n+v 


“n+l? Pn = n+1 pe 


5] n— 


an = 
Substituting these in Eq. (7.12) and simplifying yields 
(n+ 1)Lp,, = Qntv+1—x)L,—-(at+v)Li_). 


With Em = —1 and o2 = 0, we get A, = —n, and the differential equation 
of Proposition 8.1.4 becomes 


2yv v 


d- Ly dL} D 
x at +(v+1- x) dx +nL,, = 0. (8.18) 


8.4.3. Legendre Polynomials 


Instead of discussing the Jacobi polynomials as a whole, we will discuss 
a special case of them, the Legendre polynomials P,,(x), which are more 
widely used in physics. 

With yu = 0 = v, corresponding to the Legendre polynomials, the weight 
function for the Jacobi polynomials reduces to w(x) = 1. The standardiza- 
tion is K, = (—1)"2"n!. Thus, the generalized Rodriguez formula reads 


(—1)” qd” 


PaO) = Fiat dah 


[(1—x7)"]. (8.19) 


To find k@ , we expand the expression in square brackets using the binomial 
theorem and take the nth derivative of the highest power of x. This yields 


(x*") 


= L oanQn 1)(2n — 2)---(n + 1)x”. 
2'n! 


(=1) dq” Sai 1 dv 
oa ae) | Sea 


ko) x" = 


2"T(nt5) 
ars)” 


After some algebra (see Problem 8.15), we get Kh” = 
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Historical Notes 

Adrien-Marie Legendre (1752-1833) came from a well-to-do Parisian family and re- 
ceived an excellent education in science and mathematics. His university work was ad- 
vanced enough that his mentor used many of Legendre’s essays in a treatise on mechan- 
ics. A man of modest fortune until the revolution, Legendre was able to devote himself 
to study and research without recourse to an academic position. In 1782 he won the prize 
of the Berlin Academy for calculating the trajectories of cannonballs taking air resistance 
into account. This essay brought him to the attention of Lagrange and helped pave the 
way to acceptance in French scientific circles, notably the Academy of Sciences, to which 
Legendre submitted numerous papers. In July 1784 he submitted a paper on planetary or- 
bits that contained the now-famous Legendre polynomials, mentioning that Lagrange had 
been able to “present a more complete theory” in a recent paper by using Legendre’s 
results. In the years that followed, Legendre concentrated his efforts in number theory, 
celestial mechanics, and the theory of elliptic functions. In addition, he was a prolific cal- 
culator, producing large tables of the values of special functions, and he also authored an 
elementary textbook that remained in use for many decades. In 1824 Legendre refused 
to vote for the government’s candidate for /nstitut National. Because of this, his pension 
was stopped and he died in poverty and in pain at the age of 80 after several years of 
failing health. 

Legendre produced a large number of useful ideas but did not always develop them in the 
most rigorous manner, claiming to hold the priority for an idea if he had presented merely 
a reasonable argument for it. Gauss, with whom he had several quarrels over priority, 
considered rigorous proof the standard of ownership. To Legendre’s credit, however, he 
was an enthusiastic supporter of his young rivals Abel and Jacobi and gave their work 
considerable attention in his writings. Especially in the theory of elliptic functions, the 
area of competition with Abel and Jacobi, Legendre is considered more of a trailblazer 
than a great builder. Hermite wrote that Legendre “is considered the founder of the theory 
of elliptic functions” and “greatly smoothed the way for his successors”, but notes that 
the recognition of the double periodicity of the inverse function, which allowed the great 
progress of others, was missing from Legendre’s work. 

Legendre also contributed to practical efforts in science and mathematics. He and two of 
his contemporaries were assigned in 1787 to a panel conducting geodetic work in cooper- 
ation with the observatories at Paris and Greenwich. Four years later the same panel mem- 
bers were appointed as the Academy’s commissioners to undertake the measurements and 
calculations necessary to determine the length of the standard meter. Legendre’s seem- 
ingly tireless skill at calculating produced large tables of the values of trigonometric and 
elliptic functions, logarithms, and solutions to various special equations. 

In his famous textbook Eléments de géométrie (1794) he gave a simple proof that z is 
irrational and conjectured that it is not the root of any algebraic equation of finite degree 
with rational coefficients. The textbook was somewhat dogmatic in its presentation of 
ordinary Euclidean thought and includes none of the non-Euclidean ideas beginning to be 
formed around that time. It was Legendre who first gave a rigorous proof of the theorem 
(assuming all of Euclid’s postulates, of course) that the sum of the angles of a triangle 
is “equal to two right angles”. Very little of his research in this area was of memorable 
quality. The same could possibly be argued for the balance of his writing, but one must 
acknowledge the very fruitful ideas he left behind in number theory and elliptic functions 
and, of course, the introduction of Legendre polynomials and the important Legendre 
transformation used both in thermodynamics and Hamiltonian mechanics. 


To find ee we look at the evenness or oddness of the polynomi- 
als. By an investigation of the Rodriguez formula—as in our study of Her- 
mite polynomials—we note that P,(—x) = (—1)” P, (x), which tells us that 
P,(x) is either even or odd. In either case, x will not have an (n — 1)st 
power. Therefore, k”~!) = 0. 

We now calculate h, as given by (8.12): 


hn 


7 (1) n! [ (1 _2rat 5)/T() ‘a Max. 


ey de 
Kn 1 2"n! =| 
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The integral can be evaluated by repeated integration by parts (see Prob- 
lem 8.16). Substituting the result in the expression above yields h, = 
2/(2n + 1). 

We need a, 6, and y, for the recurrence relation: 


ke 2P@+1+5) allG) _ n+] 
a @+prgd) 2rath atl’ 


where we used the relation '(z + 1) = zI'(z), an nepea ant property of the 
-1) 


gamma function. We also have 6, = 0 (because ky =0= boo and 
Yn = —n/(n + 1). Therefore, the recurrence relation is 
(n + 1) Phat (x) = (n+ 1)x Py(x) — nPy_1 (x). (8.20) 


Now we use Kj = —2, Pi (x) =x > me = 1, and oo = —1 to obtain 
An = —n(n + 1), which yields the following differential equation: 


d dP, 
c= t i) =| =—n(n+1)Py. (8.21) 
This can also be expressed as 
d*P, dP, 
(1 — x’) al 2x a +n(n+1)Py =0. (8.22) 


8.4.4 Other Classical Orthogonal Polynomials 


The rest of the classical orthogonal polynomials can be constructed simi- 
larly. For the sake of completeness, we merely quote the results. 


Jacobi Polynomials, P;"’” (x) 


Standardization: 
Ky = (—2)"n! 
Constants: 
4m) 9-0 TQn+u+v+l) pV n(v — pL) 
0 aTiantptv4+l)’ " ee ee 


— eI Pat p+ D441) 
" nlQnt+p+tv+DFrntut+v4l 


Rodriguez formula: 


(-1)” _ = dq” 
1 HY — x) 
ayy tt) 


Differential Equation: 


PENG) = [ated 2") 


d? Pit? 


arn 
(1 x") 72 +[w—v—(u+v+2)x]— 


dx 


+n(nt+pm+vt+1)Ph’=0 
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A Recurrence Relation: 
2n+Diaatut+v+IQn+p+v)Py) 
=(QntetvtD[Qn+e+vQntetv4+2)x+v? — py?) PH 
—2(n+p)nt+v)Qn+ut+v+2)Pe4 
Gegenbauer Polynomials, C* (x) 
Standardization: 


1 
i= 2a nea zee) 
P(n+2a)r(a+ 5) 


Constants: 


(n) — a" Tin+A) 
u ni! TA) ’ 


EO-D =9 i Jal (n+2ayr(a+ 5) 
‘ maa + OP QVNTOA) 


Rodriguez Formula: 


(-1)"T@+2ayrat ) (1 pyre rq 7 gyre) 


C(x) = 
a) "nll (n +4 5)P(2A) dx" 


Differential Equation: 


aCe dC} 
(1 — x’) Ta 7 A+ Dx +n(n+2A)Ci =0 


A Recurrence Relation: 
(n+ 1)Ch,, =2(n+A)xC* —(n+2A—1)Cr_, 


Chebyshev Polynomials of the First Kind, 7, (x) 


Standardization: 
(2n)! 
a n 
Bn = CN) onal 
Constants: 
Ke) = ond KO) — 9, hig * 
Rodriguez Formula: 
(—1)"2"n! ay 1/2 a” a\n—1/2 
Ih) = oy OE) gale) | 
Differential Equation: 
dT, dT, 
2 n n Det = 
(1 x) F3 aor Fe +n°T, =0 


A Recurrence Relation: 


Thi = 2xT, — Th-1 
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Chebyshev Polynomials of the Second Kind, U,, (x) 
Standardization: 


2 1)! 
| neha eae 
2"(n+ 1)! 
Constants: 
=' ma 
k” = ie k@ 1) _ 0, hy = 7 
Rodriguez Formula: 
(-1)"2"(n +1)! 2-1/2 a” Qy\n+1/2 
U, = 1 I De 
ai 8) ae 
Differential Equation: 
aU, dU, 
(1 a) rr 3x 7 +n(n+2)U, =0 


A Recurrence Relation: 


Uns = 2xUn — Un—1 


8.5 _ Expansion in Terms of Orthogonal Polynomials 


Having studied the different classical orthogonal polynomials, we can now 
use them to write an arbitrary function f € £2 (a, b) as a series of these 
polynomials. If we denote a complete set of orthogonal (not necessarily 
classical) polynomials by |C;) and the given function by | f), we may write 


lf) = > aglCx), (8.23) 
k=0 


where a,x is found by multiplying both sides of the equation by (C;| and 
using the orthogonality of the |C;)’s: 


a ee AeA 
(CALA) = YL aakCH1C4) = ar(Ci1CD > 4= Tey 8H) 


This is written in function form as 


— sr Ci (x) f (x) w(x) dx 


— 8.25 
f? Ci) Pwr) dx — 


We can also “derive” the functional form of Eq. (8.23) by multiplying both 
of its sides by (x| and using the fact that (x| f) = f(x) and (x|Cx) = Cx(x). 
The result will be 
[o,@) 
fG= >" aCr@). (8.26) 


k=0 


8.5 Expansion in Terms of Orthogonal Polynomials 


Fig. 8.1 The voltage is +Vo for the upper hemisphere, where 0 < 6 < 2/2, or where 
0 <cosé < 1. It is —Vo for the lower hemisphere, where 2/2 < 6 < z, or where 
—l<cosé <0 


Example 8.5.1 The solution of Laplace’s equation in spherically symmet- 
ric electrostatic problems that are independent of the azimuthal angle is 
given by 
—( b 
D(r,0) = (a + cur) Px(cos0 (8.27) 
k=0 
Consider two conducting hemispheres of radius a separated by a small 
insulating gap at the equator. The upper hemisphere is held at potential Vo 
and the lower one at — Vo, as shown in Fig. 8.1. We want to find the potential 
at points outside the resulting sphere. Since the potential must vanish at 
infinity, we expect the second term in Eq. (8.27) to be absent, 1.e., c, = 0 Vk. 
To find bx, substitute a for r in (8.27) and let cos@ = x. Then, 


[oe] by 
Pa, x) =) ar Pe), 
k=0 


where 
-V if-l 0, 
= 0 ; <x< 
+Vo if0<x<1. 


From Eq. (8.25), we have 


by fy Pe) ® (a, x) dx aoa 


= = Py (x)@(a, x) dx 
ath PE Peo? dx 2 J 
a 
=hy 
2k+1 § : 
— 5 Vo -{ Pa(xyae +f Py(x) dx |}. 
-1 0 


To proceed, we rewrite the first integral: 


0 0 1 il 
/ Pu(xydx =~ f Pu(—ydy= | Pu(—yydy = (0k | Pe(x)dx, 
—1 +1 0 0 
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where we made use of the parity property of P;, (x). Therefore, 


by _ 2k+1 
aktl1~ oa 


1 
Vo[1 — ( vy f Py(x) dx. 


It is now clear that only odd polynomials contribute to the expansion. Using 
the result of Problem 8.27, we get 

by k-1y2 2k + Ik - 1)! 

—— = (-1) Py, k odd, 
ee ee P! 
or 
(2m)! 

22m+ mim + 1)! 
Note that ®(a, x) is an odd function; that is, ®(a, —x) = —P(a, x) as is 
evident from its definition. Thus, only odd polynomials appear in the expan- 
sion of ®(a, x) to preserve this property. Having found the coefficients, we 
can write the potential: 


bom+1 = (4m + 3)a™*? Vo(—1)™ 


(1,0) =Vo )(-1)” 


m=0 


(4m + 3)(2m)! (¢ 


2m+2 
22m+1m'(m +4 1)! ) P2m+1(cos 6). 


r 


The place where Legendre polynomials appear most naturally is, as men- 
tioned above, in the solution of Laplace’s equation in spherical coordinates. 
After the partial differential equation is transformed into three ordinary dif- 
ferential equations using the method of the separation of variables, the dif- 
ferential equation corresponding to the polar angle 6 gives rise to solutions 
of which Legendre polynomials are special cases. This differential equa- 
tion simplifies to Legendre differential equation if the substitution x = cos 0 
is made; in that case, the solutions will be Legendre polynomials in x, or 
in cos@. That is why the argument of Px(x) is restricted to the interval 
[-1,+1]. 


Example 8.5.2 We can expand the Dirac delta function in terms of Legen- 
dre polynomial. We write 


5(x) = Yo ay Pa(x), (8.28) 
n=0 
where 
1 
eee — ! / P,(x)8(x) dx = at > 0). (8.29) 
= 


For odd n this will give zero, because P,, (x) is an odd polynomial. This is 
to be expected because 5(x) is an even function of x [6(x) = 6(—x) =0 for 
x #0]. To evaluate P, (0) for even n, we use the recurrence relation (8.20) 
for x = 0: 


(n+ 1) Pn41(0) = —nP,_\(0), 


8.6 Generating Functions 


or nP, (0) = —(n — 1) P,_2(0), or P, (0) = —2— a 2(O). Iterating this m 
times, we obtain 
(n — 1)(n—3)---(n—2m+ 1) 


Pn (0) = ( 1) nin —2)(n 4) (nam FD) er O- 


For n = 2m, this yields 


(2m — 1)(2m — 3)-++3- 


1 
Im(2m — 2)---4-2 Po(0). 


P2m (0) = ( i” 


Now we “fill the gaps” in the numerator by multiplying it—and the denom- 
inator, of course—by the denominator. This yields 


tne = HQ 2) eas 1 
[2m(2m — 2)---4-2)2 
(2m)! (2m)! 


={=1) mip | 1) 22m (m!)2’ 


Pom (0) = ( 1 


because Po(x) = 1. Thus, we can write 


S. 4m +1 2m)! 
2 i ae. 


2: 1)2 
44 2 22™ (m!) 


We can also derive this expansion as follows. For any complete set of 
orthonormal vectors {| fx) }72,, we have 


8(x — x’) = w(x) (x|x’) = w(x) (x] 11x’) 


=wootel( 1s (fal) =) = we) EC ) fila). 


Legendre polynomials are not orthonormal; but we can make them so by 
dividing Py(x) by hj/* = /2/@k+ 1). Then, noting that w(x) = 1, we 
obtain 


Pex’) PRO) qn 2K + tp 
v4 “27 J27 OK +1) 270K +1) pe a 


For x’ = 0 we get 


cae a 
sa@)= >> 5 Pe(0) P(x), 


k=0 


which agrees with Eqs. (8.28) and (8.29). 


8.6 Generating Functions 


It is possible to generate all orthogonal polynomials of a certain kind from 
a single function of two variables g(x, t) by repeated differentiation of that 
function. Such a function is called a generating function. This generating 


generating function 
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Table 8.2 Generating functions for Hermite, Laguerre, Legendre, and both Chebyshev 
polynomials 


Polynomial Generating function an 

An (x) exp(—?? + 2xr) a 

LY (x) a 1 

Py, (x) (t? — 2xt +1)-1/? 1 

Tn (x) (1 —¢?)(t? — 2xt + 1)7! 2ifn40,ag=1 
Un (x) (t? —2xt +1)7! 1 


function is assumed to be expandable in the form 


(ee) 


2a) alt), (8.30) 


n=0 


so that the nth derivative of g(x, t) with respect to t evaluated at t = 0 gives 
F,,(x) to within a multiplicative constant. The constant a, is introduced for 
convenience. Clearly, for g(x, t) to be useful, it must be in closed form. The 
derivation of such a function for general F;,(x) is nontrivial, and we shall 
not attempt to derive such a general generating function. Instead, we simply 
quote these functions in Table 8.2, and leave the derivation of the generating 
functions of Hermite and Legendre polynomials as Problems 8.12 and 8.21. 
For Laguerre polynomials see [Hass 08, pp. 679-680]. 


8.7 Problems 


8.1 Let n = 1 in Eq. (8.1) and solve for se. Now substitute this in the 
derivative of ws” p<; and show that the derivative is equal to ws”—! pex41. 


Repeat this process m times to prove Lemma 8.1.2. 


8.2 Find w(x), a, and b for the case of the classical orthogonal polynomials 
in which s(x) is of second degree. 


8.3 Integrate by parts twice and use Lemma 8.1.2 to show that 
¥ tg 
: Fn(ws Fy) dx =0 form <n. 
a 


8.4 Using Lemma 8.1.2 conclude that 


(a) (ws F/)'/w is a polynomial of degree less than or equal to n. 

(b) Write (ws F,)'/w as a linear combination of F;(x), and use their or- 
thogonality and Problem 8.3 to show that the linear combination col- 
lapses to a single term. 

(c) Multiply both sides of the differential equation so obtained by F,, and 
integrate. The RHS becomes h,,4,,. For the LHS, carry out the differ- 
entiation and note that (ws)’/w = K,F. 


8.7 Problems 


Now show that K, FF) + sF”’ is a polynomial of degree n, and that the 
LHS of the differential equation yields {Kik(?n + oon(n — 1)}n. Now 
find Aj. 


8.5 Derive the recurrence relation of Eq. (8.8). Hint: Differentiate Eq. (8.5) 
and substitute for F’’ from the differential equation. Now multiply the re- 
sulting equation by a,x + B, and substitute for (a,x + By) F,, from one of 
the earlier recurrence relations. 


8.6 Using only the orthogonality of Hermite polynomials 
9 2 
/ @* A(x) Hy (x) dx = /m 2”"n! binn 
—oo 
generate the first three of them. 


8.7 Use the generalized Rodriguez formula for Hermite polynomials and 


integration by parts to expand x7" and x?‘+! in terms of Hermite polynomi- 


als. 


8.8 Use the recurrence relation for Hermite polynomials to show that 
CO 2 1 
/ xe-* A(X) Hy (x) dx = afr a n} [Sniint +2(n+ L)8in naa]: 
—oo 
What happens when m =n? 


8.9 Apply the general formalism of the recurrence relations given in the 
book to Hermite polynomials to find the following: 


H, + Hi_, —2xH,-1 =0. 
8.10 Show that 


2x 7? 1 
i xe * Ha a)dx = V2 (n+ a 


—oo 


8.11 Use a recurrence relations for Hermite polynomials to show that 


H.(0)= 0) if n is odd, 

nO) = (-1)"2! ifn =2m. 
8.12 Differentiate the expansion of g(x,t) for Hermite polynomials with 
respect to x (treating ¢ as a constant) and choose a, such that nay = dy—1 
to obtain a differential equation for g. Solve this differential equation. To 
determine the “constant” of integration use the result of Problem 8.11 to 
show that ¢(0,t) =e". 
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8.13 Use the expansion of the generating function for Hermite polynomials 
to obtain 


~~ —x? sme —x242x(s+r)—(92+17) 
) e Ay (x) Hy, (x) —— =e~ . 
min! 
m,n=0 


Then integrate both sides over x and use the orthogonality of the Hermite 
polynomials to get 


5 oe" os s 
ae *H2(x)dx = Jae", 


Deduce from this the normalization constant h, of Hy, (x). 


8.14 Using the recurrence relation of Eq. (8.14) repeatedly, show that 


hg ifn>k, 
e* Am (x) Aintn(x) dx 


oF 0 
X = 
[. J/r2"(m+k)! ifn=k. 


8.15 Show that for Legendre polynomials, ki” = 2"P(n + 4)/{n!P(4)]. 
Hint: Multiply and divide the expression given in the book by n!; take a 
factor of 2 out of all terms in the numerator; the even terms yield a factor of 
n!, and the odd terms give a gamma function. 


8.16 Using integration by parts several times, show that 


[ (1—x?)"dx = 2™n(n — 1)---a—m+1) a ae Maes ie oe 
=4 3355 73530m — 1) = 


Now show that 


i (1 -"ar= 27(5)n! 


1 Qn+ Tat 4)’ 


8.17 Given that Po(x) = 1 and P;(x) =x, find Po(x), P3(x), and P4(x) 
using an appropriate recurrence relation. 


8.18 Use the generalized Rodriguez formula to show that Po(1) = 1 and 
P,(1) = 1. Now use a recurrence relation to show that P,(1) = 1 for all n. 
To be rigorous, you need to use mathematical induction. 


8.19 Apply the general formalism of the recurrence relations given in the 
book to find the following two relations for Legendre polynomials: 


@) 2F,=47F, +P 4 =O. 
(b) (1—x?)P) —nPy_1+nxP, =0. 


8.20 Show that 


1 atl q\y2 
/ Pax ———— 
ay Qn+1)! 


8.7 Problems 


Hint: Use the definition of h, and Ki and the fact that P, is orthogonal to 
any polynomial of degree lower than n. 


8.21 Differentiate the expansion of g(x, t) for Legendre polynomials, and 
choose a, = 1. For P’, you will substitute two different expressions to get 


two equations. First use Eq. (8.11) with n + | replaced by n, to obtain 


lo.) 
(1- re +tg= ae AO 
n=2 


As an alternative, use Eq. (8.10) to substitute for P’ and get 


dg = 
(-—xt)— = Sone ei +t. 
dx = 


Combine the last two equations to get (t* — 2xt + 1)g’ = tg. Solve this 
differential equation and determine the “constant” of integration by using 
P,(1) = | to show that g(1, t) = 1/(1. — 2). 


8.22 Use the generating function for Legendre polynomials to show that 
Py (1) = 1, Pn(—1) = (— 1)", Pn (0) = 0 for odd n, and P/ (1) =n(n+ 1)/2. 


8.23 Both electrostatic and gravitational potential energies depend on the 
quantity 1/|r—r’|, where r’ is the position vector of a point inside a charge 
or mass distribution and r is the position vector of the observation point. 


(a) Let r lie along the z-axis, and use spherical coordinates and the defi- 
nition of generating functions to show that 


[ee 


1 1 re\" 
ma = »(=) P,(cos@), 
~ > 


n=0 \ 7 


where rz(rs) is the smaller (larger) of r and r’, and @ is the polar 
angle. 
(b) The electrostatic or gravitational potential energy @(r) is given by 


@(r) = kf oe ee 


where k is a constant and p(r’) is the (charge or mass) density func- 
tion. Use the result of part (a) to show that if the density depends only 
on r’, and not on any angle (i.e., o is spherically symmetric), then 
@(r) reduces to the potential energy of a point charge at the origin for 
r>r’, 

(c) What is (r) for a spherically symmetric density which extends from 
the origin to a, with a > r for any r of interest? 

(d) Show that the electric field E or gravitational field g (i.e., the negative 
gradient of ®) at any radial distance r from the origin is given by 
rome, where Q(r) is the charge or mass enclosed in a sphere of 

radius r. 
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8.24 Use the generating function for Legendre polynomials and their or- 
thogonality to derive the relation 


1 1 
2n 2 
t P. dx. 
I, — - I, 7 (X) x 


Integrate the LHS, expand the result in powers of ft, and compare these pow- 
ers on both sides to obtain the normalization constant hy. 


8.25 Evaluate the following integrals using the expansion of the generating 
function for Legendre polynomials. 


™ (acos@ +b) sind dé 
0 Va2+2abcosé + b2 

™ (acos? 6 + bsin’ @) sind dé 
0 Va®+2abcosO+h2 


(a) 


(b) 


8.26 Differentiate the expansion of the Legendre polynomial generating 
function with respect to x and manipulate the resulting expression to ob- 
tain 


(1—2xt+27) Soe" Pi(x) =) t" Pa(x). 


n=0 n=0 


Equate equal powers of t on both sides to derive the recurrence relation 
Pgh = = 0, 


8.27 Show that 


i 5k0 if k is even, 
Px(x) dx = pewrg_p . 
, Enea if k is odd. 


Hint: For even k, extend the region of integration to (—1, 1) and use the 
orthogonality property. For odd k, note that 


gives zero for the upper limit (by Lemma 8.1.3). For the lower limit, expand 
the expression using the binomial theorem, and carry out the differentiation, 
keeping in mind that only one term of the expansion contributes. 


8.28 Show that g(x, t) = g(—x, —t) for both Hermite and Legendre poly- 
nomials. Now expand g(x, tf) and g(—x, —f) and compare the coefficients 
of t” to obtain the parity relations for these polynomials: 


Hy (—x) =(—D)" Ane) and Py(—x) = (-1)" Pr (x). 


8.29 Derive the orthogonality of Legendre polynomials directly from the 
differential equation they satisfy. 


8.7 Problems 


8.30 Expand |x| in the interval (—1, +1) in terms of Legendre polynomials. 
Hint: Use the result of Problem 8.27. 


8.31 Apply the general formalism of the recurrence relations given in the 
book to find the following two relations for Laguerre polynomials: 


dL? 


(a) nL,—(@+v)Ly_,-*72 =0. 
(b) @+DL0,,—Qnt+v+1—2)L34+(+v)L%_, =0. 


8.32 From the generating function for Laguerre polynomials given in Ta- 
ble 8.2 deduce that L?(0) =C(n+v41)/[a'lw+t DI. 


8.33 Let L, = L°. Now differentiate both sides of 


e7Xt/(-1) °° 


= = der En(e) 


with respect to x and compare powers of t to obtain L/,(0) = —n and 
Li(0) = 5n(n — 1). Hint: Differentiate 1/(1 — t) = )°°2.9 2” to get an ex- 
pression for (1 — t)~?. 


8.34 Expand e~* as a series of Laguerre polynomials L} (x). Find the co- 
efficients by using (a) the orthogonality of L(x) and (b) the generating 
function. 


8.35 Derive the recurrence relations given in the book for Jacobi, Gegen- 
bauer, and Chebyshev polynomials. 


8.36 Show that 7,,(—x) = (—1)"7T,,(x) and U,(—x) = (—1)"U, (x). Hint: 
Use g(x, t) = g(—x, —1). 


8.37 Show that T,(1) = 1, Un(1) =n + 1, T(—-1) = (-1)", Un(-D) = 
(—1)" (a + 1), Tom (0) = (—1)” = Urm (0), and Tom41 (0) = 0 = U2m+1 (0). 
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Fourier Analysis 


The single most recurring theme of mathematical physics is Fourier analy- 
sis. It shows up, for example, in classical mechanics and the analysis of nor- 
mal modes, in electromagnetic theory and the frequency analysis of waves, 
in noise considerations and thermal physics, in quantum theory and the 
transformation between momentum and coordinate representations, and in 
relativistic quantum field theory and creation and annihilation operation for- 
malism. 


9.1 Fourier Series 


One way to begin the study of Fourier series and transforms is to invoke 
a generalization of the Stone-Weierstrass Approximation Theorem (The- 
orem 7.2.3), which established the completeness of monomials, x*. The 
generalization of Theorem 7.2.3 permits us to find another set of orthog- 
onal functions in terms of which we can expand an arbitrary function. This 
generalization involves polynomials in more than one variable ([Simm 83, 
pp. 160-161]): 


Theorem 9.1.1 (Generalized Stone-Weierstrass Theorem) Jf the 
function f (x1, x2,..-,Xn) is continuous in the domain {aj < xj < 
bj }7_,, then it can be expanded in terms of the monomials a a aoe xk : 


where the kj are nonnegative integers. 


Now let us consider functions that are periodic and investigate their ex- 
pansion in terms of elementary periodic functions. We use the generalized 
Stone-Weierstrass theorem with two variables, x and y. A function g(x, y) 
can be written as g(x, y) = -po,—0 dkmx* y. In this equation, x and y can 
be considered as coordinates in the x y-plane, which in turn can be written 
in terms of polar coordinates r and 6. In that case, we obtain 


CO 
f(r,0) = g(rcos6,rsind)= > aymr**” cos* 6 sin” 0. 
k,m=0 
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In particular, if we let r = 1, we obtain a function of 6 alone, which upon 
substitution of complex exponentials for sin@ and cos @ becomes 


= it cans re ae | ; ; oe. ; 
f (0) = > Aim x (e fhe er) ann (e’? a, eieym = » be”, 
k,m=0 n=—0o 


(9.1) 
where b,, is a constant that depends on ax,;,. The RHS of (9.1) is periodic 
with period 277; thus, it is especially suitable for periodic functions f(@) 
that satisfy the periodicity condition f(@ — 72) = f(@4+7). 

We can also write Eq. (9.1) as 


[o.@) 
f(0)=bot+ Ge + b_ne'””) 


n=1 


[o.@) 
= bo + Y“[On + b-n) cosnd + i (by — b-n) sinnd)] 
n=l =A =B 


or 


CO 
f@)=bot+ SY o(An cosné + B, sinn@). (9.2) 
AS 
If f(@) is real, then bp, Ay, and B, are also real. Equation (9.1) or (9.2) is 
called the Fourier series expansion of f (0). 
Let us now concentrate on the elementary periodic functions e!”?. We 
define the {|e,)}P° , such that their “Oth components” are given by 


1 . 
(O|én) = ——e'”®, where 6 € (—x, 2). 


V20 


These functions—or ket vectors—which belong to L?(—n, 7), are or- 
thonormal, as can be easily verified. It can also be shown that they are 
complete. In fact, for functions that are continuous on (—z,7), this is a 
result of the generalized Stone-Weierstrass theorem. It turns out, however, 
that {|én)}°°., is also a complete orthonormal sequence for piecewise contin- 
uous functions on (—z, 7).' Therefore, any periodic piecewise continuous 
function of @ can be expressed as a linear combination of these orthonormal 


vectors. Thus if | f) € £2(—z, 7), then 


If} = So falen), where fn = (enl f). (9.3) 


n=—C} 


We can write this as a functional relation if we take the 6th component of 
both sides: (0| f) = }-°° fn(len), or 


n=—O -» 


1 . 
0)=—= D> frei” 9.4 
f@) = oe e (9.4) 


'A piecewise continuous function on a finite interval is one that has a finite number of 
discontinuities in its interval of definition. 


9.1 Fourier Series 


with f, given by 


f= (elt1f) = lenl( f ](61 a9) F) 


IT 


(en|0)(0| f) dO e'" £9) dé. (9.5) 


a 1 
iz Jon —T 

It is important to note that even though f(@) may be defined only for 
—m <0 <7, Eq. (9.4) extends the domain of definition of f(@) to all the 
intervals (2k — 1)w <0 < (2k + 1) for all k € Z. Thus, if a function is 
to be represented by Eq. (9.4) without any specification of the interval of 
definition, it must be periodic in 0. For such functions, the interval of their 
definition can be translated by a factor of 277. Thus, f(@) with -7 <0<az 
is equivalent to f(@ — 2mzr) with 2mm — 2 <0 < 2mm + 7; both will 
give the same Fourier series expansion. We shall define periodic functions 
in their fundamental cell such as (—z, 7). 


Historical Notes 

Joseph Fourier (1768-1830) did very well as a young student of mathematics but had 
set his heart on becoming an army officer. Denied a commission because he was the son 
of a tailor, he went to a Benedictine school with the hope that he could continue studying 
mathematics at its seminary in Paris. The French Revolution changed those plans and set 
the stage for many of the personal circumstances of Fourier’s later years, due in part to 
his courageous defense of some of its victims, an action that led to his arrest in 1794. 
He was released later that year, and he enrolled as a student in the Ecole Normale, which 
opened and closed within a year. His performance there, however, was enough to earn him 
a position as assistant lecturer (under Lagrange and Monge) in the Ecole Polytechnique. 
He was an excellent mathematical physicist, was a friend of Napoleon (so far as such 
people have friends), and accompanied him in 1798 to Egypt, where Fourier held various 
diplomatic and administrative posts while also conducting research. Napoleon took note 
of his accomplishments and, on Fourier’s return to France in 1801, appointed him prefect 
of the district of Isére, in southeastern France, and in this capacity built the first real road 
from Grenoble to Turin. He also befriended the boy Champollion, who later deciphered 
the Rosetta stone as the first long step toward understanding the hieroglyphic writing of 
the ancient Egyptians. 

Like other scientists of his time, Fourier took up the flow of heat. The flow was of interest 
as a practical problem in the handling of metals in industry and as a scientific problem 
in attempts to determine the temperature in the interior of the earth, the variation of that 
temperature with time, and other such questions. He submitted a basic paper on heat con- 
duction to the Academy of Sciences of Paris in 1807. The paper was judged by Lagrange, 
Laplace, and Legendre, and was not published, mainly due to the objections of Lagrange, 
who had earlier rejected the use of trigonometric series. But the Academy did wish to en- 
courage Fourier to develop his ideas, and so made the problem of the propagation of heat 
the subject of a grand prize to be awarded in 1812. Fourier submitted a revised paper in 
1811, which was judged by the men already mentioned and others. It won the prize but 
was criticized for its lack of rigor and so was not published at that time in the Mémoires 
of the Academy. 

Fourier developed a mastery of clear notation, some of which is still in use today. (The 
modern integral sign and the placement of the limits of integration near its top and bot- 
tom were introduced by Fourier.) It was also his habit to maintain close association be- 
tween mathematical relations and physically measurable quantities, especially in limiting 
or asymptotic cases, even performing some of the experiments himself. He was one of 
the first to begin full incorporation of physical constants into his equations, and made 
considerable strides toward the modern ideas of units and dimensional analysis. 

Fourier continued to work on the subject of heat and, in 1822, published one of the classics 
of mathematics, Théorie Analytique de la Chaleur, in which he made extensive use of the 
series that now bear his name and incorporated the first part of his 1811 paper practically 
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without change. Two years later he became secretary of the Academy and was able to 
have his 1811 paper published in its original form in the Mémoires. 

Fourier series were of profound significance in connection with the evolution of the 
concept of a function, the rigorous theory of definite integrals, and the development of 
Hilbert spaces. Fourier claimed that “arbitrary” graphs can be represented by trigono- 
metric series and should therefore be treated as legitimate functions, and it came as a 
shock to many that he turned out to be right. The classical definition of the definite inte- 
gral due to Riemann was first given in his fundamental paper of 1854 on the subject of 
Fourier series. Hilbert thought of a function as represented by an infinite sequence, the 
Fourier coefficients of the function. 

Fourier himself is one of the fortunate few: his name has become rooted in all civilized 
languages as an adjective that is well-known to physical scientists and mathematicians in 
every part of the world. 


Functions are not always defined on (—z, 77). Let us consider a function 
F(x) that is defined on (a, b) and is periodic with period L = b — a. We 
define a new variable, 


20 
d= 
L 


= ae ee 
x-—a 5) > = 5. a 2” 


and note that f(@) = F((L/27)@ + a+ L/2) has period (—z, 7) because 


L L L 
fOtm)=F(5-Cm) tats) =F (xe 5) 


and F(x + L/2) = F(x — L/2). If follows that we can expand the latter as 
in Eq. (9.4). Using that equation, but writing @ in terms of x, we obtain 


ee ce L\ 1< es iB 
wrer( doses $) =e B nonfre(o-e-4) 


n=—Oo 


1 < : 
ae pers. (9.6) 
wate 


n=—OO 


where we have introduced” F, = /L/2m fye i CT/YEt#/) Using 
Eq. (9.5), we can write 


iL . 1 at 
F,= eo iQ@nn/L)(a+L/2) oe ind (0) dé 
is 20 V/ Qn —1 f 


+L 
= VE ,-i02nn/Ly(atb/2) [ eni@nn/L)@—a-L)2) p(y) 2 gy 
20 - i 


a ae ; 
=F i et Qmn/L)x Fy) dx. (9.7) 
a 


The functions exp(2zinx/L)/WL are easily seen to be orthonormal 
as members of Era. b). We can introduce {|e,) aa with the “xth com- 


ponent” given by (xen) = (1/WL)e?7'"*/, Then the reader may check 


?The F,, are defined such that what they multiply in the expansion are orthonormal in the 
interval (a, b). 


9.1 Fourier Series 


Oo CO CO OO 


-O. 


-0.2 


Fig. 9.1 Top: The periodic square wave potential with height taken to be 1. Bottom: 
Various approximations to the Fourier series of the square-wave potential. The dashed 
plot is that of the first term of the series, the thick grey plot keeps 3 terms, and the solid 
plot 15 terms 


that Eqs. (9.6) and (9.7) can be written as |F) = Scar Frlen) with 
F, = (n|F). 


Example 9.1.2 In the study of electrical circuits, periodic voltage signals 
of different shapes are encountered. An example is a square wave voltage 
of height Uo, “duration” T, and “rest duration” T [see Fig. 9.1(a)]. The 
potential as a function of time V(t) can be expanded as a Fourier series. The 
interval is (0, 27) because that is one whole cycle of the potential variation. 
We therefore use Eq. (9.6) and write 


1 
V(t= > Verte (ar where 
Vv 2T n=—0o 
1 2T : 
Vn = aa. gree Va dt. 
0 


The problem is to find V,,. This is easily done by substituting 


Uo if0<t<T, 


V@= 
0 wfT<t<2T 


square wave voltage 
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in the last integral: 


Up LE Uo T 
Vn = —— | eT a= : (—1)”—1] wheren 40 
" J/2T Jo Vat \ inx 
0 ifn is even andn 40, 
~ | 2240 ifnis odd. 
= : — 1 27 — _1 fT = T 
For n = 0, we obtain Vo = sar Jo V(t)dt = war Jo Updt = Uo,/ 5- 


Therefore, we can write 


—l oe) 
1 T <V2T Uo 1, 1, 
Vn= U = pinnt/T = jinnt/T 
(t) mal 0 5 + = ) re + ) ae 
odd 


n=-—0oo n=1 
n odd n 


i Ale a = 
= —innt/T + jinnt/T 
vol 5+ | De ae i 


n=1 n=l 
no n 


[ee 


1 2 1. /[2k-+ Lat 
=U 
of 5+ EY ets i )} 


k=0 


Figure 9.1(b) shows the graphical representation of the above infinite sum 
when only a finite number of terms are present. 


sawtooth voltage Example 9.1.3 Another frequently used voltage is the sawtooth voltage 
[see Fig. 9.2(a)]. The equation for V(t) with period T is V(t) = Uot/T for 
0 <+t <T, and its Fourier representation is 


i. = 1 st 
V(t) ath V, ee oe where V, = — | go VG) dt. 
JT p25 " "VF Jo 


Substituting for V(t) in the integral above yields 
T 
0 


1 7? t , 
V,= —2nint/T Uy —dt=U ey. —2nint/T y gy 
; ah ° — ° 


T T 
= Usr Ft -2nint/T as r i en 2rint/T ay 
5 


—i2nn 
=0 
T? UjyVT 
= ur? ; ) a= where n # 0, 
—i2nn i2n1 


Vo= ; [ voa- ! [ whar=huovF 
° VT Jo Siig 


271 


9.1 Fourier Series 


1 


Fig.9.2 Top: The periodic saw-tooth potential with height taken to be 1. Bottom: Various 
approximations to the Fourier series of the sawtooth potential. The dashed plot is that of 
the first term of the series, the thick grey plot keeps 3 terms, and the solid plot 15 terms 


Thus, 


-1 lee) 
1 UT | 2 t/T | in t/T 
sUovT (> nen we 


n=— OOo n=1 


Figure 9.2(b) shows the graphical representation of the above series keeping 
the first few terms. 


The foregoing examples indicate an important fact about Fourier series. 
At points of discontinuity (for example, t = T in the preceding two exam- 
ples), the value of the function is not defined, but the Fourier series expan- 
sion assigns it a value—the average of the two values on the right and left 
of the discontinuity. For instance, when we substitute t = T in the series of 
Example 9.1.3, all the sine terms vanish and we obtain V(T) = Ug/2, the 
average of Up (on the left) and 0 (on the right). We express this as 


V(T)= slvr —0)+ V(T +0)| = = lim[V(T —€) + V(T + €)]. 


1 
2 «30 
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This is a general property of Fourier series. In fact, the main theorem of 
Fourier series, which follows, incorporates this property. (For a proof of this 
theorem, see [Cour 62].) 


Theorem 9.1.4 The Fourier series of a function f (0) that is piece- 
wise continuous in the interval (—1, 1.) converges to 


s[f0+0+ 0-0] for—n <0 <n, 


1 
5Lf@) + f(-n)| ford =+n. 


Although we used exponential functions to find the Fourier expansion of 
the two examples above, it is more convenient to start with the trigonometric 
series when the expansion of a real function is sought. Equation (9.2) already 
gives such an expansion. All we need to do now is find expressions for Ay 
and B,,. From the definitions of A, and the relation between b, and f, we 


get 
1 
An =by +b n= en t Ff n) 
: ( ae il —n8 £6) dO + oy  -@)a0) 
= — | — e —. e 
V2m \V20 Jn V20 J—n 
S18 . . 1 TU 
=— [ei + en? | p@ya0 = — cosn6f(0)d0. (9.8) 
QT, Je ae 
Similarly, 


B= - [ sinn6 f (0) dé, 
TU Jn 
(9.9) 


1 1 am 1 
bp = — o= = (0)d0 = —Apo. 
/ 20 f 20 7 f 2 
So, for a function f(@) defined in (—z, zr), the Fourier trigonometric series 
is as in Eq. (9.2) with the coefficients given by Eqs. (9.8) and (9.9). For a 
function F(x), defined on (a, b), the trigonometric series becomes 


F( al 45. A aE 4 . 2nwx (9.10) 
x = 5 0 n COS Z n SiN Z : . 


n=1 


where 


(9.11) 


9.1 Fourier Series 


A convenient rule to remember is that for even (odd) functions—which 
are necessarily defined on a symmetric interval around the origin—only co- 
sine (sine) terms appear in the Fourier expansion. 

It is useful to have a representation of the Dirac delta function in terms 
of the present orthonormal basis of Fourier expansion. First we note that we 
can represent the delta function in terms of a series in any set of orthonormal 
functions (see Problem 9.23): 


(x —x') = Da Ine a (# ‘)w(a). (9.12) 


Next we use the basis of the Fourier expansion for which w(x) = 1. We then 
obtain 


oo i re 
e2tinx/L e 2minx'/L 1 


0° 
5(x — x’) — ~~ TE FE = ; > e2tin(x—x')/L_ 


n=—OoO n=—OoO 


9.1.1 The Gibbs Phenomenon 


The plot of the Fourier series expansions in Figs. 9.1(b) and 9.2(b) exhibit 
a feature that is common to all such expansions: At the discontinuity of 
the periodic function, the truncated Fourier series overestimates the actual 
function. This is called the Gibbs phenomenon, and is the subject of this Gibbs phenomenon 
subsection. 
Let us approximate the infinite series with a finite sum. Then 


ful@) = = > hee = > ee fe evin®! £(6") a0! 


Jan n=—N 
1 20 N in(6—6") 

at / i in(@— 
-5 | do (ee , 


where we substituted Eq. (9.5) in the sum and, without loss of generality, 
changed the interval of integration from (—z, z) to (0,277). Problem 9.2 
shows that 


we in(0-6') _ anv Ee 6’) 
sin[ 5 (0 —0’)] 


n=—N 


It follows that 


1 7 sin(N + 3) — 69] 
6) = — d 6 
fv@ Sau oF) sin[ 4(6 — 6’)] 


1 27-8 sin[(N + 4)] 
= — d 0) ———_—_—_ 
ee of (o +4) sind) 

—<— -—_—_—_—4 
=S(¢) 
2n-0 


= be Is dof(d +4)S(P). (9.13) 
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We want to investigate the behavior of fy at a discontinuity of f. By 
translating the limits of integration if necessary, we can assume that the 
discontinuity of f occurs at a point a such that 0 #£a@ ¥ 27. Let us denote 
the jump at this discontinuity for the function itself by Af, and for its finite 
Fourier sum by A fy: 


Af = flat+e)— f(a@—e), Afy = fn(a+e) — fy(a—e). 


Then, we have 


1 2m —a—€ 
Afv=s-f dof +a+0si) 


1 2m -—a+e 
=F. dof (¢+a—«)S(¢) 
TJ —ate 


1 —ate 

7 atl dbf +at Os) 
2m —a—€ 

+f . wfo+a+os()| 


1 2m -—a—€ 
-=|/ dof (¢+a—e«)S(¢) 


2m —a+e 
ie [ dof(ota— os} 


1 —at+e 
= x={/ dof@ta+os@) 
WT | J—a—e 


2m —a+e 
-{ wfo+a—os)} 


1 2m —a—€ 
To, do[f(e@tate)—feta—e)]S@). 
TJ —ate 


The first two integrals give zero because of the small ranges of integration 
and the continuity of the integrands in those intervals. The integrand of the 
third integral is almost zero for all values of the range of integration except 
when @ © 0. Hence, we can confine the integration to the small interval 
(—6,+6) for which the difference in the square brackets is simply A/. It 
now follows that 


Afy (3) © a9 sin oe ap a Af f sin _ + 5)¢] d 


?, 


where we have emphasized the dependence of fy on 6 and approximated 
the sine in the denominator by its argument, a good approximation due to 
the smallness of @. The reader may find the plot of the integrand in Fig. 7.3, 
where it is shown that the major contribution to the integral comes from the 
interval [0, 7/(N + 5). where /(N + 5) is the first zero of the integrand. 
Furthermore, it is clear that if the upper limit is larger than 2/(N + 5) the 
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result of the integral will decrease, because in each interval of length 277, the 
area below the horizontal axis is larger than that above. Therefore, if we are 
interested in the maximum overshoot of the finite sum, we must set the upper 
limit equal to 7/(N + 5): It follows firstly that the maximum overshoot of 
the finite sum occurs at 2/(N + 5) ~7t/N to the right of the discontinuity. 
Secondly, the amount of the maximum overshoot is 


t/(N+4) 1 
(Afi max © ae a 


2 ™ sinx 
= af [ —— dx ~1.179Af. (9.14) 
0 XxX 


IU 


Thus 


Box 9.1.5 (The Gibbs Phenomenon) The finite (large-N) sum ap- 
proximation of the discontinuous function overshoots the function it- 
self at a discontinuity by about 18 percent. 


9.1.2 Fourier Series in Higher Dimensions 


It is instructive to generalize the Fourier series to more than one dimension. 
This generalization is especially useful in crystallography and solid-state 
physics, which deal with three-dimensional periodic structures. To gener- 
alize to N dimensions, we first consider a special case in which an N- 
dimensional periodic function is a product of N one-dimensional periodic 
functions. That is, we take the N functions 


oo 
f9@) = - fers j=1,2,...,N, 


J k=—00 


and multiply them on both sides to obtain 
1 : 
FO) = fF FO On)» F (aw) = sD Fue’®™, 9.15) 
VV 


where we have used the following new notations: 


F(r) = f Gar) fF (2) F™ aw), V=LyL)---Ly, 
k = (kj, ko,..., kn), Fy = f+ fey 
2, = 20 (k1/Li,...,kn/Lw), r= (x1, X2,...,Xn). 


We take Eq. (9.15) as the definition of the Fourier series for any periodic 
function of N variables (not just the product of N functions of a single 
variable). However, application of (9.15) requires some clarification. In one 
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dimension, the shape of the smallest region of periodicity is unique. It is 
simply a line segment of length L, for example. In two and more dimen- 
sions, however, such regions may have a variety of shapes. For instance, in 
two dimensions, they can be rectangles, pentagons, hexagons, and so forth. 
Thus, we let V in Eq. (9.15) stand for a primitive cell of the N-dimensional 
lattice. This cell is important in solid-state physics, and (in three dimensions) 
is called the Wigner-Seitz cell. 
It is customary to absorb the factor 1/./V into Fx, and write 


; 1 F 
Fir)= ye Fye'®kT S&S k= al Frye '8k™ dN x, (9.16) 
k Vay 
where the sum is a multiple sum over (k,...,ky) and the integral is a 


multiple integral over a single Wigner-Seitz cell. 

Recall that F(r) is a periodic function of r. This means that when r is 
changed by R, where R is a vector describing the boundaries of a cell, then 
we should get the same function: F(r + R) = F(r). When substituted in 
(9.16), this yields 


F(ir+R)= Fete FR) — a el R Fe! T 
k k 
which is equal to F(r) if 
el8kR — 1 (9.17) 
Le., if g, - R is an integral multiple of 27. 
In three dimensions R = m a, + m2a2 + m3a3, where m1, m2, and m3 
are integers and aj, a2, and a3 are crystal axes, which are not generally 
orthogonal. On the other hand, g, = nb; +n2b2 +. 3b3, where n1, 12, and 


n3 are integers, and b;, bz, and bz are the reciprocal lattice vectors defined 
by 


27 (az X a3) 27 (a3 X a1) 2m (a, X a2) 
b, = ————_., 2= , = : 
aj - (a2 X a3) ay - (a2 X a3) aj - (a2 X a3) 


The reader may verify that b; - aj; = 2776;;. Thus 


3 3 
g, ‘R= (dyna) : (dma = Si nimjb; “aj 
i=l j=l LJ 


3 
=2n = m jn; = 27 (integer), 
j=l 


and Eq. (9.17) is satisfied. 


9.2 Fourier Transform 


The Fourier series representation of F(x) is valid for the entire real line as 
long as F(x) is periodic. However, most functions encountered in physical 
applications are defined in some interval (a, b) without repetition beyond 


9.2 Fourier Transform 


f@) 
x 
fg ee 
a (a) b=a+L 
a-L a at+L at+2L 
(b) 


Fig. 9.3 (a) The function we want to represent. (b) The Fourier series representation of 
the function 


that interval. It would be useful if we could also expand such functions in 
some form of Fourier “series”. 

One way to do this is to start with the periodic series and then let the 
period go to infinity while extending the domain of the definition of the 
function. As a specific case, suppose we are interested in representing a 
function f(x) that is defined only for the interval (a, b) and is assigned the 
value zero everywhere else [see Fig. 9.3(a)]. To begin with, we might try 
the Fourier series representation, but this will produce a repetition of our 
function. This situation is depicted in Fig. 9.3(b). 

Next we may try a function g, (x) defined in the interval (a — A/2,b+ 
A/2), where A is an arbitrary positive number: 


0 ifa—A/2<x <a, 
SA(x)= 4 f(x) ifa<x <b, 
0 ifb<x<b+A/2. 


This function, which is depicted in Fig. 9.4, has the Fourier series represen- 
tation 


[o,@) 
> CT i ana (9.18) 


1 
ye) > ——S$ 
oe ih 


where 
b+ A/2 


a/ [ 
§Asn = 
" L+A Ja-a/2 


We have managed to separate various copies of the original periodic 
function by A. It should be clear that if A — oo, we can completely iso- 
late the function and stop the repetition. Let us investigate the behavior of 
Egs. (9.18) and (9.19) as A grows without bound. First, we notice that the 


gE) eG) dx. (9.19) 


277 


278 


Fourier integral 
transforms 


9 Fourier Analysis 


a—A/2 b +A/2 


Fig.9.4 By introducing the parameter A, we have managed to separate the copies of the 
function 


quantity k, defined by k, = 2na/(L + A) and appearing in the exponent 
becomes almost continuous. In other words, as n changes by one unit, k, 
changes only slightly. This suggests that the terms in the sum in Eq. (9.18) 
can be lumped together in j intervals of width An ;, giving 


— _gatkj) 

(x) © I i 
entre Yo gen 
where kj; = 2n;7/(L+ A), and gy(kj) = 8A,n; Substituting An; =[(L+ 
A)/2|Ak; in the above sum, we obtain 


ere A or) 
§ (kj) ikjx Lt A 1 ~ ikyx 
x ) Ak; = ) k;)e'“i* Ak;, 
ga) vine w 20 I Son. baie 7 


j=—0o j=-0O 


where we introduced g,(k;) defined by g,(kj) = /(L + A)/2z ga (kj). It 
is now clear that the preceding sum approaches an integral in the limit that 
A — oo. In the same limit, g,(x) > f(x), and we have 


1 co. 
a ke dk, 9.20 
f(x) sl. f(kje (9.20) 


(k)= lim ZaGk)) = him +4 gay) 
f = ne — ge a ed 
L+ta 1 b+A/2 a 
= li ———— —ikjx d 
Pare 20 a ae ‘ ca 


= ze f- f wen dx. (9.21) 


Equations (9.20) and (9.21) are called the Fourier integral transforms of 
Ff (k) and f(x), respectively. 


where 


Example 9.2.1 Let us evaluate the Fourier transform of the function de- 
fined by 


b if |x| <a, 


0 if|x|>a 


ro)=| 
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_a a 


Fig.9.5 The square “bump” function 


(see Fig. 9.5). From (9.21) we have 


f= [™ poe tax f" eitrar = 2 (SH) 
20 ~co Mia fn Jn ka ’ 


which is the function encountered (and depicted) in Example 7.3.2. 
Let us discuss this result in detail. First, note that if a > oo, the function 
J (x) becomes a constant function over the entire real line, and we get 


FW 2b i sinka 2b 5(k) 
= im = ue 

Vin are k J 20 
by the result of Example 7.3.2. This is the Fourier transform of an 
everywhere-constant function (see Problem 9.12). Next, let b — oo and 
a — 0 in such a way that 2ab, which is the area under f(x), is 1. Then 
Ff (x) will approach the delta function, and f(k) becomes 


2ab sinka _ 1 |. sinka _ 1 


= im => . 
2° J2n ka J/2n a0 ka J 27t 
a> 


So the Fourier transform of the delta function is the constant 1/./2z. 

Finally, we note that the width of f(x) is Ax = 2a, and the width of f (k) 
is roughly the distance, on the k-axis, between its first two roots, k4 and k_, 
on either side of k = 0: Ak = ky — k_ = 2x/a. Thus increasing the width 
of f(x) results in a decrease in the width of fi (k). In other words, when the 
function is wide, its Fourier transform is narrow. In the limit of infinite width 
(a constant function), we get infinite sharpness (the delta function). The last 
two statements are very general. In fact, it can be shown that Ax Ak > 1 
for any function f(x). When both sides of this inequality are multiplied 
by the (reduced) Planck constant i = h/(2z7c), the result is the celebrated 
Heisenberg uncertainty relation:* 


Ax Ap > fh, 


where p = hk is the momentum of the particle. 


3In the context of the uncertainty relation, the width of the function—the so-called wave 
packet—measures the uncertainty in the position x of a quantum mechanical particle. 
Similarly, the width of the Fourier transform measures the uncertainty in k, which is 
related to momentum p via p = hk. 
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Having obtained the transform of f(x), we can write 


2 2b sinka dana? [ sinka ike yp 
V2 co V2 k TW J—oo k 


Example 9.2.2 Let us evaluate the Fourier transform of a Gaussian g(x) = 
ae~’®’ with a,b > 0: 


f(x) = 


oo —k?/4b oo 
= 24; dae ~~ : 2 
é b(x +tkx/b) 7y e b(x+ik/2b) dx. 


a 
V2 Joo V 20 —cO 


To evaluate this integral rigorously, we would have to use techniques de- 
veloped in complex analysis, which are not introduced until Chap. 11 (see 
Example 11.3.8). However, we can ignore the fact that the exponent is com- 
plex, substitute y = x + ik/(2b), and write 


oe) [o@) 
i ble tik/2b)P ay =f a a= pe 
es —oo b 


Thus, we have g(k) = ve ek /(4>) which is also a Gaussian. 


&(k) = 


We note again that rh width of g(x), which is proportional to 1/./b, is 
in inverse relation to the width of g(k), which is proportional to /b. We 
thus have Ax Ak ~ 1. 


Equations (9.20) and (9.21) are reciprocals of one another. However, it is 
not obvious that they are consistent. In other words, if we substitute (9.20) 
in the RHS of (9.21), do we get an identity? Let’s try this: 


Fay= ef axe ™ Feyetran’ 


—_ == /- ax ff f(k')e i(k’ SK)X ak! 


We now change the order of the two integrations: 


f= [. an Fe) = [. dx aaa 


But the expression in the square brackets is the delta function (see Exam- 
ple 7.3.2). Thus, we have f(k) = qe dk’ f (k’)8(k’ — k), which is an iden- 
tity. 

As in the case of Fourier series, Eqs. (9.20) and (9.21) are valid even if 
f and f are piecewise continuous. In that case the Fourier transforms are 
written as 


slfe +0) + fe -0)] = se | feta. 
(9.22) 
s[Fk+0+ Fe 0)] = =f fixe dx, 
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where each zero on the LHS is an € that has gone to its limit. 
It is useful to generalize Fourier transform equations to more than one 
dimension. The generalization is straightforward: 


f@M= i d"ke'** f(b), 


I 
On 
On) (9.23) 


- k a 1 d” —ik-r 
f( =a | xf (re : 


Let us now use the abstract notation of Sect. 7.3 to get more insight into 
the preceding results. In the language of Sect. 7.3, Eq. (9.20) can be written 
as 


win ff te Atoiana = ci( f wikia LA (9.24) 


—oo 
where we have defined 
1 


ikx 
ee. 9.25 
on (9.25) 


(x|k) = 


Equation (9.24) suggests the identification | ri ) =|) as well as the identity 


(oe) 
1 =) |k) (k| dk, (9.26) 
—oo 
which is the same as (7.17). Equation (7.19) yields 
(k|k’) =3d(k—-k’), (9.27) 


which upon the insertion of a unit operator gives an integral representation 
of the delta function: 


8(k —K) = (klk) = wi(/ 


—oo 


[e,2) 


|x) (x| ax) [k’) 


oo 1 a 7 
= (k|x) (x|k') dx = dxei& x. 
va) 


as ~ On Je 


Obviously, we can also write 


a(x —x')= i dkei 2k 
Dine fess , 


If more than one dimension is involved, we use 


l i(k—k’)- 
a(k-k) = [atse' ae 


I ed 
8(r—r’) = 5 oz f ankle 
JU 


with the inner product relations 


(9.28) 


1 ik-r 1 ~ik-r 
(rIk) = Goa? , kin) = Bane : (9.29) 
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Equations (9.28) and (9.29) and the identification | a ) =|f) exhibit a 
striking resemblance between |r) and |k). In fact, any given abstract vector 
| f) can be expressed either in terms of its r representation, (r| f) = f(r), or 
in terms of its k representation, (k| f) = fa (k). These two representations are 
completely equivalent, and there is a one-to-one correspondence between 
the two, given by Eq. (9.23). The representation that is used in practice is 
dictated by the physical application. In quantum mechanics, for instance, 
most of the time the r representation, corresponding to the position, is used, 
because then the operator equations turn into differential equations that are 
(in many cases) linear and easier to solve than the corresponding equations 
in the k representation, which is related to the momentum. 


Example 9.2.3 In this example we evaluate the Fourier transform of the 
Coulomb potential V(r) of a point charge gq: V(r) = q/r. The Fourier 
transform is important in scattering experiments with atoms, molecules, and 
solids. As we shall see in the following, the Fourier transform of V(r) is not 
defined. However, if we work with the Yukawa potential, 

—ar 


Va(r) = x , a>QO, 


the Fourier transform will be well-defined, and we can take the limit a > 0 
to recover the Coulomb potential. Thus, we seek the Fourier transform of 
Va(r). 

We are working in three dimensions and therefore may write 


ei 1 3. ik. ge * 
Va (k) = aa Ill 4 Xe ! as 


It is clear from the presence of r that spherical coordinates are appropriate. 
We are free to pick any direction as the z-axis. A simplifying choice in this 
case is the direction of k. So, we let k = |k|é, = ké,, or k- r= krcos6, 
where @ is the polar angle in spherical coordinates. Now we have 


” q oo big 20 , ee 
Vy, (k) = osm | Pdr | sinoao | doe Ure 
(21)3/? Jo 0 0 r 
The ¢ integration is trivial and gives 27. The @ integration is done next: 


sf ; 1 ; 1 ; . 
sin Oe ik 089 de= oe tkru dy = —(el*” _ el, 
0 =1 ikr 


We thus have 


~ gQa) [3 No te mir 
V, = ——__— = _ 
(kK) ny? | sare (e oun) 


q Le (-a+ik)r —(a+ik)r 
=> Qn)? ik dr[e = € | 
Tv 1 0 


q 1 (— oo oe (atik)r ‘ 
~ (Qr)'/2 ik \ —a + ik ae 


0 - a+ik 
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Fig. 9.6 The Fourier transform of the potential of a continuous charge distribution at P 
is calculated using this geometry 


Note how the factor e~*” has tamed the divergent behavior of the expo- 
nential at r — oo. This was the reason for introducing it in the first place. 
Simplifying the last expression yields 
2q 1 

Vin +02 

The parameter @ is a measure of the range of the potential. It is clear that the 
larger a is, the smaller the range. In fact, it was in response to the short range 
of nuclear forces that Yukawa introduced a. For electromagnetism, where 


the range is infinite, a becomes zero and V,(r) reduces to V(r). Thus, the 
Fourier transform of the Coulomb potential is 


Vy (k) = 


If a charge distribution is involved, the Fourier transform will be different. 


Example 9.2.4 The example above deals with the electrostatic potential of 
a point charge. Let us now consider the case where the charge is distributed 
over a finite volume. Then the potential is 


Va)= | qo(r a sy p(r’) a 
|r’ — lr’ —r| 


where gp(r’) is the charge density at r’, and we have used a single integral 
because dx’ already indicates the number of integrations to be performed. 
Note that we have normalized p(r’) so that its integral over the volume is 1, 
which is equivalent to assuming that the total charge is g. Figure 9.6 shows 
the geometry of the situation. 

Making a change of variables, R= r’ —r, orr’ =R-+r, and d?x' =d°Xx, 
with R= (X, Y, Z), we get 


V a 1 3,.,—-ik-r p(R+r) 3 
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To evaluate Eq. (9.30), we substitute for o(R + r) in terms of its Fourier 
transform, 


1 : “i 
p(R+r)= aie | BH ACK er) (9.31) 


Combining (9.30) and (9.31), we obtain 


= _ 4 3. 3 epee 1\_ir-(k/—k) 
V(k) = d°xd Xd" ——A(K)e 


(2x)3 
_ 3 On ae / 1 3. ir-(k/—k) 
=q ae p(k’) Ony3 d°xe 
——<——$— i ee! 
3(k’—k) 
- 3 eikR 
=qptk) | ax—. (9.32) 


What is nice about this result is that the contribution of the charge dis- 
tribution, 0(k), has been completely factored out. The integral, aside from 
a constant and a change in the sign of k, is simply the Fourier transform of 
the Coulomb potential of a point charge obtained in the previous example. 
We can therefore write Eq. (9.32) as 


Vk) = 22)9/7A(k) Veout(—k) = —. 

This equation is important in analyzing the structure of atomic parti- 
cles. The Fourier transform V (k) is directly measurable in scattering exper- 
iments. In a typical experiment a (charged) target is probed with a charged 
point particle (electron). If the analysis of the scattering data shows a devi- 
ation from 1/ k? in the behavior of V(k), then it can be concluded that the 
target particle has a charge distribution. More specifically, a plot of k?V(k) 
versus k gives the variation of 0(k), the form factor, with k. If the resulting 
graph is a constant, then 0(k) is a constant, and the target is a point particle 
[6(k) is a constant for point particles, where A(r’) « 6(r — r’)]. If there is 
any deviation from a constant function, 0(k) must have a dependence on k, 
and correspondingly, the target particle must have a charge distribution. 

The above discussion, when generalized to four-dimensional relativistic 
space-time, was the basis for a strong argument in favor of the existence 
of point-like particles—quarks—inside a proton in 1968, when the results 
of the scattering of high-energy electrons off protons at the Stanford Linear 
Accelerator Center revealed deviation from a constant for the proton form 
factor. 


9.2.1 Fourier Transforms and Derivatives 
The Fourier transform is very useful for solving differential equations. This 


is because the derivative operator in r space turns into ordinary multipli- 
cation in k space. For example, if we differentiate f(r) in Eq. (9.23) with 


9.2 Fourier Transform 285 


respect to x;, we obtain 


a 1 0; a 
——., _ di k——eilkisite +hjpxjtethntn) F 
Ox; F©) an | axj Fw) 


1 — 
= ae ce 


That is, every time we differentiate with respect to any component of r, 
the corresponding component of k “comes down’. Thus, the n-dimensional 
gradient is 

Vi@)= 


1 age 
aie : d"k(ik)e'KT f (k), 


and the n-dimensional Laplacian is 


Vv’ f(r) = - d"k(—k*)e!*¥ fk). 

We shall use Fourier transforms extensively in solving differential equa- 
tions later in the book. Here, we can illustrate the above points with a simple 
example. Consider the ordinary second-order differential equation 


1 
(27r)n/2 


dy, ay 
—++C,—+Coy= ; 
Deg ote ay f(x) 


where Co, C1, and C2 are constants. We can “solve” this equation by simply 
substituting the following in it: 


Cc 


yn) = / dky(kye"™ — / dkS(k)(ik)e"* 
/2n ; dx 2x , 
d*y 1 / 2 ik 1 eopy pik 
= — | dky(kyke'™, G)= = f ak (k)e'™*. 
ax J 20 f J 20 f 
This gives 
Zs J asaa(-cat? +iCik+Co)e!* = ae [kiwe™. 
V 20 20 
Equating the coefficients of e’** on both sides, we obtain* 
jo 
—Cok? +iCik +Co 


If we know Fa (k) [which can be obtained from f(x)], we can calculate 
y(x) by Fourier-transforming y(k). The resulting integrals are not generally 
easy to evaluate. In some cases the methods of complex analysis may be 
helpful; in others numerical integration may be the last resort. However, the 
real power of the Fourier transform lies in the formal analysis of differential 
equations. 


4 Alternatively, we can multiply both sides by e~! Kx and integrate over x. The result of 
this integration yields 5(k — k’), which collapses the k-integrations and yields the equality 


of the integrands. 
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9.2.2. The Discrete Fourier Transform 


The preceding remarks alluded to the power of the Fourier transform in 
solving certain differential equations. If such a solution is combined with 
numerical techniques, the integrals must be replaced by sums. This is par- 
ticularly true if our function is given by a table rather than a mathemati- 
cal relation, a common feature of numerical analysis. So suppose that we 
are given a set of measurements performed in equal time intervals of At. 
Suppose that the overall period in which these measurements are done 
is T. We are seeking a Fourier transform of this finite set of data. First we 
write 


l 00 4 NS . 
f(@) = Vin [. fie dt ~) Je »X f (tre to At, 


or, discretizing the frequency as well and writing w, = mAw, with Aw to 
be determined later, we have 


N-1 


f(mAa) = = > Fonanetinswmas (©), (9.33) 
n=0 


Since the Fourier transform is given in terms of a finite sum, let us explore 
the idea of writing the inverse transform also as a sum. So, multiply both 
sides of the above equation by [ef mAc)kAr /(V2)|Aq@ and sum over m: 


1 N-1 
ara a i kAt 
f(m Aw)e!"Am Aw 
2a m=0 
Pre il 1N-1 
k—n) 
> > f(nAtye imAwAt( 
~ nN n=0 m=0 
_ TAS 
iy Le My imA@At(k—n)_ 
m=0 
Problem 9.2 shows that 
N-1 : 
y- pimAodt(k-n) _ A hee ifk=n, 
— U wAt(k—n =i] . 
m=0 at ifkAn. 


We want the sum to vanish when k #£n. This suggests demanding that 
NA@At(k — n) be an integer multiple of 277. Since Aw and At are to 
be independent of this (arbitrary) integer (as well as k and n), we must write 


T 2 
NAwAIk—n)=2n(k—n) > NAo>=2n > Ao = =. 


With this choice, we have the following discrete Fourier transforms: 
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- , Nel = 
fon= Fad, fGve OFS ae 
(9.34) 


ie 
f(t) = VS f@pel?i™,  th=ndt, 
Vie 


where we have redefined the new f to be /22 N/T times the old 7 ‘ 

Discrete Fourier transforms are used extensively in numerical calculation 
of problems in which ordinary Fourier transforms are used. For instance, if 
a differential equation lends itself to a solution via the Fourier transform as 
discussed before, then discrete Fourier transforms will give a procedure for 
finding the solution numerically. Similarly, the frequency analysis of signals 
is nicely handled by discrete Fourier transforms. 

It turns out that discrete Fourier analysis is very intensive computation- 
ally. Its status as a popular tool in computational physics is due primarily to 
a very efficient method of calculation known as the fast Fourier transform. fast Fourier transform 
In a typical Fourier transform, one has to perform a sum of N terms for ev- 
ery point. Since there are N points to transform, the total computational time 
will be of order N2. In the fast Fourier transform, one takes N to be even 
and divides the sum into two other sums, one over the even terms and one 
over the odd terms. Then the computation time will be of order 2 x (N/2)’, 
or half the original calculation. Similarly, if N/2 is even, one can further 
divide the odd and even sums by two and obtain a computation time of 
4 x (N/4), or a quarter of the original calculation. In general, if N = 2*, 
then by dividing the sums consecutively, we end up with N transforms to be 
performed after k steps. So, the computation time will be kN = N log, N. 
For N = 128, the computation time will be 100 log, 128 = 700 as opposed 
to 1287 = 16,400, a reduction by a factor of over 20. The fast Fourier trans- 
form is indeed fast! 


9.2.3. Fourier Transform of a Distribution 


Although one can define the Fourier transform of a distribution in exact 
analogy to an ordinary function, sometimes it is convenient to define the 
Fourier transform of the distribution as a linear functional. 

Let us ignore the distinction between the two variables x and k, and sim- 
ply define the Fourier transform of a function f : R— R as 


f@= =f f(tyedt. 


Now we consider two functions, f and g, and note that 


‘f= / QR / F00| = if ine mat] d 
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=f «lf faye du] dt 


-/ g(t) f(t)dt = (f,). 


The following definition is motivated by the last equation. 


Definition 9.2.5 Let g be a distribution and let f be a CY function whose 
Fourier transform f exists and is also a Cf function. Then we define the 
Fourier transform ¢ of ¢ to be the distribution given by 


(@, f) = (9, f). 


Example 9.2.6 The Fourier transform of 5(x) is given by 


" ~ ~ 1 m 
BN=6/)=FO=T | seat 


-Fgayom- (se) 


Thus, 5 = 1 / 27, as expected. 
The Fourier transform of (x — x’) = 6,/(x) is given by 


: eee. Le ns 
By = Oe. A=F@ =e fs at 


oe aes re 
= (eet )seoa, 


Thus, if p(x) = 6(« — x’), then @(t) = (1//2m)e7*", 


9.3 Problems 


9.1 Consider the function f(@) = ee 6(0 —2mz). 


m>=—C 


(a) Show that f is periodic of period 27. 
(b) What is the Fourier series expansion for f(@). 


9.2 Break the sum eae ein -9") into eae +1+ yale Use the ge- 
ometric sum formula 


N pN+l 4 
2 ar" =a 
r-1l 
n=0 
to obtain 
iN(0—0' orl 
Soa: phi yco—6' Sing NO — 6’)] 


N l 
ye ein(d—0') _. ,i(6-0') é 
nal ees sin[} (4 — 6”)] 


9.3 Problems 


By changing n to —n or equivalently, (6 — 6’) to —(6 — 6’) find a similar 
sum from — WN to —1. Now put everything together and use the trigonometric 
identity 

2cosa@ sin B = sin(a + 6) — sin(a — B) 


to show that 


> sino’) _ Sink (N + H@-6'| 


sin[5 (0 — 6’)] 


n=—N 


9.3 Find the Fourier series expansion of the periodic function defined on its 
fundamental cell as 


—3(1 +0) if—1 <0 <0, 


é)= 
#@) 5(1-0) if 0<0<7z. 


9.4 Show that A, and B,, in Eq. (9.2) are real when f(@) is real. 


9.5 Find the Fourier series expansion of the periodic function f(@) defined 
as f(@) =cosa@é@ on its fundamental cell, (—z, 7) 


(a) when a@ is an integer; 
(b) when a is not an integer. 


9.6 Find the Fourier series expansion of the periodic function defined on its 
fundamental cell, (—z, 7), as f(0) = 9. 


9.7 Consider the periodic function that is defined on its fundamental cell, 
(-a, a), as f(x) = |x|. 


(a) Find its Fourier series expansion. 

(b) Show that the infinite series gives the same result as the function when 
both are evaluated at x = a. 

(c) Evaluate both sides of the expansion at x = 0, and show that 


= 1 
2 
id aay 


9.8 Let f(x) =x be a periodic function defined over the interval (0, 2a). 
Find the Fourier series expansion of f. 


9.9 Show that the piecewise parabolic “approximation” to a? sin(x/a) in 
the interval (—a, a) given by the function 


f@= 4x(a+x) if-a<x<0 


4x(a—x) if0<x<a 
has the Fourier series expansion 


a 1 _ (2n+1)rx 
f@m= 5 5 sin : 
nm 7 (2n + 1) a 
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Plot f(x), a? sin(x /a), and the series expansion (up to 20 terms) for a = | 
between —1 and +1 on the same graph. 


9.10 Find the Fourier series expansion of f (9) = 0? for |@| < 2. Then show 
that 


oo 2. oo n 
ca 1 14 (-1) 
Cue ane = d we 


9.11 Find the Fourier series expansion of 

snot if0<t<z/a, 

fO= 

0) if —z/w<t <0. 
9.12 What is the Fourier transform of 
(a) the constant function f(x) = C, and 
(b) the Dirac delta function 6(x)? 
9.13 Show that 
(a) if g(x) is real, then g*(k) = g(—k), and 
(b) if g(x) is even (odd), then g(k) is also even (odd). 
9.14 Let g-(x) stand for the single function that is nonzero only on a subin- 
terval of the fundamental cell (a,a + L). Define the function g(x) as 


CO 


g(x)= D> gc(x— jl). 


j=—00 


(a) Show that g(x) is periodic with period L. 
(b) Find its Fourier transform g(k), and verify that 


B(k) = Lac(k) D>) 8(kL — 2mz). 


m=—-C 


(c) Find the (inverse) transform of g(k), and show that it is the Fourier 
series Of ge(x). 


9.15 Evaluate the Fourier transform of 


(x) b—b\|x|/a_ if |x| <a, 
x)= 
: 0 if |x| >a. 


9.16 Let f (0) be a periodic function given by f (6) = °°. anei”®. Find 


re n=—OoO “N 
its Fourier transform f(r). 


9.3 Problems 


9.17 Let 
sin@ot if |t|< T, 
t)= 
Fe) ; if |t]} > T. 
Show that 
™ 1 sin[(w—@o)T] ~ sin[(@+ao)T] 
fo = : 
J 200 @— wo wo+wo 


Verify the uncertainty relation AwAt © 47. 
9.18 If f(x) = g(x +a), show that f(k) = e~'*g(k). 


9.19 For a > 0 find the Fourier transform of f(x) = e~@!*!. Is fk) sym- 
metric? Is it real? Verify the uncertainty relations. 


9.20 The displacement of a damped harmonic oscillator is given by 


Ae “elo! if t > 0, 


ro= |. P20: 


Find ri (@) and show that the frequency distribution \f (w)|? is given by 


_ Aa l 
~ In (w — wo)? +02" 


| #@)|? 


9.21 Prove the convolution theorem: 
lee) lee) £ . 
/ fx)gQy —x)dx = / fake dk. 
—0oo —0o 
What will this give when y = 0? 
9.22 Prove Parseval’s relation for Fourier transforms: 
lee) lee) z 
/ f(x)g* (x) dx = A Sf (k)g* (k) dk. 
—0o —C 


In particular, the norm of a function—with weight function equal to 1—is 
invariant under Fourier transform. 


9.23 Use the completeness relation 1 = >~,, |n)(n| and sandwich it between 
|x) and (x’| to find an expression for the Dirac delta function in terms of an 


infinite series of orthonormal functions. 


9.24 Use a Fourier transform in three dimensions to find a solution of the 
Poisson equation: V?P(r) = —4rp(r). 


9.25 For (x) = 6(x — x’), find @(y). 


9.26 Show that f(t) = f(t). 


convolution theorem 


Parseval’s relation 
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9.27 The Fourier transform of a distribution ¢ is given by 


= 1 
6) =D) 8-2). 


n=0 
What is g(x)? Hint: Use P(x) = ¢(—x) 
9.28 For f(x) = i axx*, show that 


a k 
fu) =V20 So ikas(u), where 8(u) = = su). 


= du 


Part Ill 
Complex Analysis 


Complex Calculus 1 O 


Complex analysis, just like real analysis, deals with questions of continuity, 
convergence of series, differentiation, integration, and so forth. The reader 
is assumed to have been exposed to the algebra of complex numbers. 


10.1 Complex Functions 


A complex function is a map f :C > C, and we write f(z) = w, where 
both z and w are complex numbers.'! The map f can be geometrically 
thought of as a correspondence between two complex planes, the z-plane 
and the w-plane. The w-plane has a real axis and an imaginary axis, which 
we can call u and v, respectively. Both u and v are real functions of the 
coordinates of z, i.e., x and y. Therefore, we may write 


f(z) =u(x, y) +iv(x, y). (10.1) 


This equation gives a unique point (u, v) in the w-plane for each point 
(x, y) in the z-plane (see Fig. 10.1). Under f, regions of the z-plane are 
mapped onto regions of the w-plane. For instance, a curve in the z-plane 
may be mapped into a curve in the w-plane. The following example illus- 
trates this point. 


Example 10.1.1 Let us investigate the behavior of a couple of elementary 
complex functions. In particular, we shall look at the way a line y = mx in 
the z-plane is mapped into curves in the w-plane. 


(a) For w= f(z) =z’, we have 


w= (x +iy)? =x? — y* 4 2ixy, 


'Strictly speaking, we should write f : S > C where S is a subset of the complex plane. 
The reason is that most functions are not defined for the entire set of complex numbers, 
so that the domain of such functions is not necessarily C. We shall specify the domain 
only when it is absolutely necessary. Otherwise, we use the generic notation f :C > C, 
even though f is defined only on a subset of C. 
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f 
z 
(xy) w (u,v) 
Fig. 10.1 A map from the z-plane to the w-plane 
y v 
a 20 
Qa 
as u 
(a) 
v 
y eo 
QO 
x u 
(b) 


Fig. 10.2 (a) The map z? takes a line with slope angle a and maps it to a line with twice 
the angle in the w-plane. (b) The map e* takes the same line and maps it to a spiral in the 
w-plane 


(b) 


with u(x, y) = x* — y* and v(x, y) = 2xy. How does the region of 
C consisting of all points of a line get mapped into C? For y = mx, 
i.e., for a line in the z-plane with slope m, these equations yield u = 
(1 — m?)x? and v = 2mx?. Eliminating x in these equations, we find 
v = [2m/(1 — m?)]u. This is a line passing through the origin of the 
w-plane [see Fig. 10.2(a)]. Note that the angle the image line makes 
with the real axis of the w-plane is twice the angle the original line 
makes with the x-axis. (Show this!). 

The function w = f(z) = e® = e**"” gives u(x, y) = e* cosy and 
u(x, y) = e* siny. What is the image of the line y = mx under 
this map? Substituting y = mx, we obtain u = e* cosmx and v = 
e* sinmx. Unlike part (a), we cannot eliminate x to find v as an ex- 
plicit function of uw. Nevertheless, the last pair of equations are para- 
metric equations of a curve, which we can plot in a wv-plane as shown 
in Fig. 10.2(b). 


Limits of complex functions are defined in terms of absolute values. 
Thus, lim,_,, f(z) = wo means that given any real number € > 0, we can 


find a corresponding real number 6 > 0 such that | f(z) — wo| < € whenever 
|z — a| <6. Similarly, we say that a function f is continuous at z = a if 


lim, f@M=f@. 


10.2 Analytic Functions 297 
10.2. Analytic Functions 
The derivative of a complex function is defined as usual: 


Definition 10.2.1 Let f :C — C be acomplex function. The derivative of 
f at Zo is 


df) _ 5, £Go+ Ax) — fo) 
a im : 


dz a Az—>0 Az 


provided that the limit exists and is independent of Az. 


In this definition “independent of Az” means independent of Ax and 
Ay (the components of Az) and, therefore, independent of the direction of 
approach to zg. The restrictions of this definition apply to the real case as 
well. For instance, the derivative of f(x) = |x| at x =0 does not exist? 
because it approaches +1 from the right and —1 from the left. 

It can easily be shown that all the formal rules of differentiation that apply 
to the real case also apply to the complex case. For example, if f and g are 
differentiable, then f + g, fg, and—as long as g is not zero at the point of 
interest— f/g are also differentiable, and their derivatives are given by the 
usual rules of differentiation. 


Example 10.2.2 Let us examine the derivative of f(z) = x* + 2iy? at z= 


1+i: : : 
Example illustrating path 
df =p hie fd+i+Az)— fdt+i dependence of 
dz | e414; ~ Az>0 Az derivative 
i (1 Ax)* 237i + Ay 1 = 21 
= lim - 
a ee Ax +iAy 
‘ 2Ax + 4i Ay + (Ax)? + 2i(Ay)? 
= 1m 4 
Ax >0 Ax +1Ay 
Ay>0 


Let us approach z = 1+ along the line y— 1 = m(x—1). Then Ay = mAx, 
and the limit yields 
df , 2Ax + 4imAx + (Ax)? + 2im?(Ax)? -2+4im 
— im = 


dz |r 43 ~ Ax30 Ax +imAx ~ L+im’ 


It follows that we get infinitely many values for the derivative depending on 
the value we assign to m, i.e., depending on the direction along which we 
approach | +i. Thus, the derivative does not exist at z= 1+ i. 


It is clear from the definition that differentiability puts a severe restric- 
tion on f(z) because it requires the limit to be the same for all paths go- 
ing through zo. Furthermore, differentiability is a /ocal property: To test 


One can rephrase this and say that the derivative exists, but not in terms of ordinary func- 
tions, rather, in terms of generalized functions—in this case 6 (x)—discussed in Sect. 7.3. 
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whether or not a function f(z) is differentiable at zg, we move away from 
zo only by a small amount Az and check the existence of the limit in Defi- 
nition 10.2.1. 

What are the conditions under which a complex function is differen- 
tiable? For f(z) =u(x, y) +iv(x, y), Definition 10.2.1 yields 


df fim | {00+ Ax, yo + Ay) — uo, Yo) 
dz wo Ar>8 Ax +iAy 
yo 


+i v(xo + Ax, yo + Ay) — v(Xx0, yo) 
Ax +iAy , 


If this limit is to exist for all paths, it must exist for the two particular paths 
on which Ay = 0 (parallel to the x-axis) and Ax = 0 (parallel to the y-axis). 
For the first path we get 


af) ie u(xo + Ax, yo) — u(xo, yo) 
dz a6 ~ Ax0 Ax 
. v(xo + Ax, — v(x0, a OU 
ae Hie (xo + Ax, yo) = v@%0, Yo) _ du ga 
Ax—>0 Ax 8X | (29,90) IX | (x9, ¥0) 
For the second path (Ax = 0), we obtain 
df) _ 1, Uo. yo + Ay) — uo, Yo) 
dz|,, Ay70 iAy 
v(xo, yo + Ay) — v(x, me) dv 
4d Hig Ue Ay) (0, Yo) __, ou ra 
ayer iAy dy (x0,Y0) dy (xo,Yo) 


If f is to be differentiable at zo, the derivatives along the two paths must be 
equal. Equating the real and imaginary parts of both sides of this equation 
and ignoring the subscript zo (x0, yo, or Zo is arbitrary), we obtain 


ee ag (10.2) 
ox dy 


These two conditions, which are necessary for the differentiability of f, are 
called the Cauchy-Riemann conditions. 

An alternative way of writing the Cauchy-Riemann (C-R) conditions is 
obtained by making the substitution? x = 5(z + z*) and y= H(z — z*) 
in u(x, y) and v(x, y), using the chain rule to write Eq. (10.2) in terms 
of z and z*, substituting the results in aE = ie +i a and showing that 
Eq. (10.2) is equivalent to the single equation df/dz* = 0. This equation 
says that 


Box 10.2.3 If f is to be differentiable, it must be independent of z*. 


3We use z* to indicate the complex conjugate of z. Occasionally we may use Z. 
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If the derivative of f exists, the arguments leading to Eq. (10.2) imply 


that the derivative can be expressed as Expression for the 


d 9 9 9 9 derivative of a 
-_ CG a i i u (10.3) differentiable complex 


dz 0x ox dy ay” function 


The C-R conditions assure us that these two equations are equivalent. 


The following example illustrates the differentiability of complex func- 
tions. 


Example 10.2.4 Let us determine whether or not the following functions 
are differentiable: 


(a) 


(b) 


(c) 


We have already established that f(z) = x? +2iy? is not differentiable 
at z= 1-+i7. We can now show that it has no derivative at any point in 
the complex plane (except at the origin). This is easily seen by noting 
that wu = x* and v = 2y, and that du/x = 2x 4 dv/dy = 4y, and 
the first Cauchy-Riemann condition is not satisfied. The second C-R 
condition is satisfied, but that is not enough. 

We can also write f(z) in terms of z and z*: 


I ° : 
f@= [5e+2)| +2iLe-2)] 
- x — 2i)(2? +27) + 5 + 2i)zz". 


f(z) has an explicit dependence on z*. Therefore, it is not differen- 
tiable. 
Now consider f(z) = x? — y? + 2ixy, for which u = x* — y? and v= 
2xy. The C-R conditions become du/dx = 2x = du/dy and du/dy = 
—2y =—dv/dx. Thus, f(z) may be differentiable. Recall that the C-R 
conditions are only necessary conditions; we have not shown (but we 
will, shortly) that they are also sufficient. 

To check the dependence of f on z*, substitute x = (z+ z*)/2 and 
y = (z— 2*)/(2i) in u and v to show that f(z) = z’, and thus there is 
no z* dependence. 
Let u(x, y) = e* cos y and v(x, y) = e* siny. Then du/dx = e* cos y = 
dv/dy and du/dy = —e* sin y = —dv/0x, and the C-R conditions are 
satisfied. Also, 


f(z) =e cos y+ie* sin y = e* (cos y+i sin y) = e* 


erty = 


v= e, 


and there is no z* dependence. 


The requirement of differentiability is very restrictive: The derivative 


must exist along infinitely many paths. On the other hand, the C-R con- 
ditions seem deceptively mild: They are derived for only two paths. Never- 
theless, the two paths are, in fact, true representatives of all paths; that is, 
the C-R conditions are not only necessary, but also sufficient: 
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Theorem 10.2.5 The function f(z) = u(x, y) +iv(2, y) is differentiable in 
a region of the complex plane if and only if the Cauchy-Riemann conditions, 
du dv ou dv 

—=— and —= 

ox dy dy ox 
(or, equivalently, df /dz* = 0), are satisfied and all first partial derivatives 
of u and v are continuous in that region. In that case 


GF 08 dv. .ou 
dz ox dx dy dy 


Proof We have already shown the “only if” part. To show the “if” part, note 
that if the derivative exists at all, it must equal (10.3). Thus, we have to show 
that 
. f(z+tAz)— fl) dau dv 
lim = +i 
Az—>0 Az Ox ox 


or, equivalently, that 


Az) — C) C) 

persed =)@) - +i : <eé whenever |Az| <6. 
Az ax Ox 

By definition, 


ferag)=7@) 
=u(x+ Ax, y+ Ay) +iv(x + Ax, y+ Ay) — u(x, y) —iv(x, y). 


Since u and v have continuous first partial derivatives, we can write 


ou Ou 
u(x + Ax, y+ Ay) =u(x,y)+ ao Fon ee Ay, 
x Bd 


dv dv 
v(x + Ax, y+ Ay) = v(x, y) + Oa + ao? + €2Ax + d2Ay, 
x y 


where €1, €2, 6, and 62 are real numbers that approach zero as Ax and Ay 
approach zero. Using these expressions, we can write 


dv du dv 
f(z+Az)-—f@= (> +i lax ti(is 2 )a 


+ (€; tien) Ax + (6; +152) Ay 


0 
-(> +12) (as +iay) bean +5Ay, 


where € = €; +i€2, 6 = 6; +162, and we used the C-R conditions in the last 
step. Dividing both sides by Az = Ax + iAy, we get 


fatAnd— Ff) (> a) Ax Ay 
+1 =e€ ; 
Az Ox Ox 
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By the triangle inequality, |RHS| < |e; +7€2|+|6, +162]. This follows from 
the fact that | Ax|/|Az| and | Ay|/|Az| are both equal to at most 1. The € and 
6 terms can be made as small as desired by making Az small enough. We 
have thus established that when the C-R conditions hold, the function f is 
differentiable. 


Historical Notes 

Augustin-Louis Cauchy (1789-1857) was one of the most influential French mathemati- 
cians of the nineteenth century. He began his career as a military engineer, but when his 
health broke down in 1813 he followed his natural inclination and devoted himself wholly 
to mathematics. 

In mathematical productivity Cauchy was surpassed only by Euler, and his collected 
works fill 27 fat volumes. He made substantial contributions to number theory and deter- 
minants; is considered to be the originator of the theory of finite groups; and did extensive 
work in astronomy, mechanics, optics, and the theory of elasticity. 

His greatest achievements, however, lay in the field of analysis. Together with his contem- 
poraries Gauss and Abel, he was a pioneer in the rigorous treatment of limits, continuous 
functions, derivatives, integrals, and infinite series. Several of the basic tests for the con- 
vergence of series are associated with his name. He also provided the first existence proof 
for solutions of differential equations, gave the first proof of the convergence of a Taylor 
series, and was the first to feel the need for a careful study of the convergence behavior 
of Fourier series (see Chap. 9). However, his most important work was in the theory of 
functions of a complex variable, which in essence he created and which has continued 
to be one of the dominant branches of both pure and applied mathematics. In this field, 
Cauchy’s integral theorem and Cauchy’s integral formula are fundamental tools without 
which modern analysis could hardly exist (see Chap. 10). 

Unfortunately, his personality did not harmonize with the fruitful power of his mind. 
He was an arrogant royalist in politics and a self-righteous, preaching, pious believer in 
religion—all this in an age of republican skepticism—and most of his fellow scientists 
disliked him and considered him a smug hypocrite. It might be fairer to put first things 
first and describe him as a great mathematician who happened also to be a sincere but 
narrow-minded bigot. 


Definition 10.2.6 A function f :C — C is called analytic at zg if it is 
differentiable at zo and at all other points in some neighborhood of zo. A 
point at which f is analytic is called a regular point of f. A point at which 
f is not analytic is called a singular point or a singularity of f. A function 
for which all points in C are regular is called an entire function. 


Example 10.2.7 (Derivatives of some functions) 


(a) f(z) =z. Here u =x and v = y; the C-R conditions are easily shown 
to hold, and for any z, we have df/dz = du/dx +idv/dx = 1. There- 
fore, the derivative exists at all points of the complex plane. 

(b) f(z) = 27. Here u = x* — y* and v = 2xy; the C-R conditions 
hold, and for all points z of the complex plane, we have df/dz = 
du/dx + idv/dx = 2x + i2y = 2z. Therefore, f(z) is differentiable 
at all points. 

(c) f(z) =z" forn > 1. We can use mathematical induction and the fact 
that the product of two entire functions is an entire function to show 
that #2") =nz"], 

(d) f(z) =aotaizt---+ Gn—1z"—! + ayz", where aj; are arbitrary con- 
stants. That f(z) is entire follows directly from (c) and the fact that 
the sum of two entire functions is entire. 
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(e) f(z) =1/z. The derivative can be found to be f’(z) = —1/2, which 
does not exist for z = 0. Thus, z = 0 is a singularity of f(z). However, 
any other point of the complex plane is a regular point of f. 

(f) f(z) =|z|?. Using the definition of the derivative, we obtain 


Af —|z+Az?—Izl? — @+Az)(e* + Az*) — z2* 
Az Az 7 Az 


* 


Az 
* * 
=z +Az"°4+z ; 
Az 


For z= 0, Af/Az = Az*, which goes to zero as Az — 0. Therefore, 
df /dz = 0 at z = 0.4 However, if z 4 0, the limit of Af/Az will de- 
pend on how z is approached. Thus, df/dz does not exist if z 4 0. 
This shows that |z|? is differentiable only at z = 0 and nowhere else in 
its neighborhood. It also shows that even if the real (here, wu = x* + y”) 
and imaginary (here, v = 0) parts of a complex function have continu- 
ous partial derivatives of all orders at a point, the function may not be 
differentiable there. 

(g) f(z) =1/sinz: This gives df/dz = —cosz/sin* z. Thus, f has in- 
finitely many (isolated) singular points at z = tna forn =0,1,2,.... 


Example 10.2.8 (The complex exponential function) In this example, we 
find the (unique) function f : C > C that has the following three proper- 
ties: 


(a) ff is single-valued and analytic for all z, 
(b) df/dz= f(z), and 
(Cc) f(@1 +22) = F(Z) f @2). 


Property (b) shows that if f(z) is well behaved, then df/dz is also well 
behaved. In particular, if f(z) is defined for all values of z, then f must be 
entire. 

For z; = 0 = z2, property (c) yields f(0) =[f(0)’ > f() = 1, or 
f (0) = 0. On the other hand, 


df | i Fle Az) — f@) 
—z= lim 
dz Az30 Az 
— jim LOFAI=- FO _ F@ lim f(Az)—1 
Az—>0 Az Az>0 Az 
Property (b) now implies that 
. f (Az) —1 _ / _ = 
io. ae = 1 > f@O=1 and fO)=1. 


The first implication follows from the definition of derivative, and the sec- 
ond from the fact that the only other choice, namely f (0) = 0, would yield 
—oo for the limit. 


4 Although the derivative of |z|? exists at z = 0, it is not analytic there (or anywhere 
else). To be analytic at a point, a function must have derivatives at all points in some 
neighborhood of the given point. 


10.2 Analytic Functions 


Now, we write f(z) = u(x, y)+iv(x, y), for which property (b) becomes 
ou .dv ou dv _ 


ax =v 

These equations have the most general solution u(x, y) = a(y)e* and 
v(x, y) = b(y)e*, where a(y) and b(y) are the “constants” of integration. 
The Cauchy-Riemann conditions now yield a(y) = db/dy and da/dy = 
—b(y), whose most general solution is a(y) = Acosy + Bsiny, b(y) = 
Asiny — Bcosy. On the other hand, f(0) = 1 yields u(0,0) = 1 and 
v(0, 0) = 0, implying that a(0) = 1, b(0) =O or A= 1, B = 0. We therefore 
conclude that 


f(z) =a(y)e* +ib(y)e* = e* (cosy +isiny) =e*e”” =e. 


Both e* and e’” are well-defined in the entire complex plane. Hence, e is 
defined and differentiable over all C; therefore, it is entire. 


Example 10.2.7 shows that any polynomial in z is entire. Example 10.2.8 
shows that the exponential function e* is also entire. Therefore, any product 
and/or sum of polynomials and e* will also be entire. We can build other 
entire functions. For instance, e’% and e~’% are entire functions; therefore, 
the trigonometric functions, defined as 

; els ns ez lz +e 
sin z = ————— and cosz=-————.,, (10.4) 
2i 2 
are also entire functions. Problem 10.5 shows that sin z and cos z have only 
real zeros. The hyperbolic functions can be defined similarly: 


ies "> ahh Ge (10.5) 
2 2 
Although the sum and the product of entire functions are entire, the ratio, 
in general, is not. For instance, if f(z) and g(z) are polynomials of degrees 
m and n, respectively, then for n > 0, the ratio f(z)/g(z) is not entire, be- 
cause at the zeros of g(z)—which always exist and we assume that it is not 
a zero of f(z)—the derivative is not defined. 
The functions u(x, y) and v(x, y) of an analytic function have an inter- 
esting property that the following example investigates. 


Example 10.2.9 The family of curves u(x, y) = constant is perpendicular 
to the family of curves v(x, y) = constant at each point of the complex plane 
where f(z) =u + iv is analytic. 

This can easily be seen by looking at the normal to the curves. The normal 
to the curve u(x, y) = constant is simply Vu = (du/dx, du/dy). Similarly, 
the normal to the curve v(x, y) = constant is Vu = (dv/0x, dv/dy). Taking 
the dot product of these two normals, we obtain 


du 0 du 0 0 0 du (0 
(Vu) - (Vv) = udu , udu _ du Ya (= = 
oxox dydy dx dy dy \ ox 


by the C-R conditions. 
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10.3. Conformal Maps 


The real and imaginary parts of an analytic function separately satisfy the 
two-dimensional Laplace’s equation: 
au du av dv 
ax "7 dy? ax? 3 dy? wee) 
This can easily be verified from the C-R conditions. 
Laplace’s equation in three dimensions, 


a n ar@ i. a°® 5 
ax2 ay2 age” 


describes the electrostatic potential ® in a charge-free region of space. In a 
typical electrostatic problem the potential ® is given at certain boundaries 
(usually conducting surfaces), and its value at every point in space is sought. 
There are numerous techniques for solving such problems, and some of 
them will be discussed later in the book. However, some of these problems 
have a certain degree of symmetry that reduces them to two-dimensional 
problems. In such cases, the theory of analytic functions can be extremely 
helpful. 

The symmetry mentioned above is cylindrical symmetry, where the po- 
tential is known a priori to be independent of the z-coordinate (the axis 
of symmetry). This situation occurs when conductors are cylinders and— 
if there are charge distributions in certain regions of space—the densities 
are z-independent. In such cases, 0®/0z = 0, and the problem reduces to a 
two-dimensional one. 

Functions satisfying Laplace’s equation are called harmonic functions. 
Thus, the electrostatic potential is a three-dimensional harmonic function, 
and the potential for a cylindrically symmetric charge distribution and 
boundary condition is a two-dimensional harmonic function. Since the real 
and the imaginary parts of a complex analytic function are also harmonic, 
techniques of complex analysis are sometimes useful in solving electrostatic 
problems with cylindrical symmetry.° 

To illustrate the connection between electrostatics and complex analysis, 
consider a long straight filament with a constant linear charge density 1. It is 
shown in introductory electromagnetism that the potential ® (disregarding 
the arbitrary constant that determines the reference potential) is given, in 
cylindrical coordinates, by 


® = 2AInp = 2Aln[(x? + y?)'7] = 2a In|zI. 


Since @ satisfies Laplace’s equation, we conclude that ® could be the real 
part of an analytic function w(z), which we call the complex potential. 


5We use electrostatics because it is more familiar to physics students. Engineering stu- 
dents are familiar with steady state heat transfer as well, which also involves Laplace’s 
equation, and therefore is amenable to this technique. 


10.3. Conformal Maps 


Example 10.2.9, plus the fact that the curves u = ® = constant are circles, 
imply that the constant-v curves are rays, i.e., v «x g. Choosing the constant 
of proportionality as 2A, we obtain 


w(z) = 2Alnp + i2Ag = 2A In(pe'®) = 2A Inz. 


It is useful to know the complex potential of more than one filament of 
charge. To find such a potential we must first find w(z) for a line charge 
when it is displaced from the origin. If the line is located at zo = x9 + iyo, 
then it is easy to show that w(z) = 2A In(z — zg). If there are n line charges 
located at z), Z2,..., Zn, then 


w(z)=2) > Ag In(z — zx). (10.7) 
k=1 


The function w(z) can be used directly to solve a number of electro- 
static problems involving simple charge distributions and conductor ar- 
rangements. Some of these are illustrated in problems at the end of this 
chapter. Instead of treating w(z) as a complex potential, let us look at it as a 
map from the z-plane (or x y-plane) to the w-plane (or wv-plane). In partic- 
ular, the equipotential curves (circles for a single line of charge) are mapped 
onto lines parallel to the v-axis in the w-plane. This is so because equipo- 
tential curves are defined by u = constant. Similarly, the constant-v curves 
are mapped onto horizontal lines in the w-plane. 

This is an enormous simplification of the geometry. Straight lines, espe- 
cially when they are parallel to axes, are by far simpler geometrical objects 
than circles,° especially if the circles are not centered at the origin. So let 
us consider two complex “worlds”. One is represented by the xy-plane and 
denoted by z. The other, the “prime world”, is represented’ by z’, and its 
real and imaginary parts by x’ and y’. We start in z, where we need to find a 
physical quantity such as the electrostatic potential ®(x, y). If the problem 
is too complicated in the z-world, we transfer it to the z’-world, in which it 
may be easily solvable; we solve the problem there (in terms of x’ and y’) 
and then transfer back to the z-world (x and y). The mapping that relates 
z and z’ must be cleverly chosen. Otherwise, there is no guarantee that the 
problem will simplify. 

Two conditions are necessary for the above strategy to work. First, the 
differential equation describing the physics must not get more complicated 
with the transfer to z’. Since Laplace’s equation is already of the simplest 
type, the z’-world must also respect Laplace’s equation. Second, and more 
importantly, the mapping must preserve the angles between curves. This is 
necessary because we want the equipotential curves and the field lines to be 
perpendicular in both worlds. A mapping that preserves the angle between 
two curves at a given point is called a conformal mapping. We already have 
such mappings at our disposal, as the following proposition shows. 


This statement is valid only in Cartesian coordinates. But these are precisely the coordi- 
nates we are using in this discussion. 


7We are using z’ instead of w, and (x’, y’) instead of (u, v). 
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Proposition 10.3.1 Let y,; and y2 be curves in the complex z-plane that 
intersect at a point zo at an angle a. Let f :C — C be a mapping given by 
f(z) =2' =x! + iy’ that is analytic at 29. Let yj and y3 be the images of y\ 
and y2 under this mapping, which intersect at an angle a’. Then, 


(a) a’ =a, that is, the mapping f is conformal, if (dz’/dz)z, #0. 
(b) Jf f is harmonic in (x, y), it is also harmonic in (x', y'). 


Proof See Problem 10.21. 


The following are some examples of conformal mappings. 


(a) z’=z-+a, where a is an arbitrary complex constant. This is simply a 
translation of the z-plane. 

(b) 2’ = bz, where b is an arbitrary complex constant. This is a dilation 
whereby distances are dilated by a factor |b|. A graph in the z-plane 
is mapped onto a similar (congruent) graph in the z’-plane that will be 
reduced (|b| < 1) or enlarged (|b| > 1) by a factor of |b]. 

(c) z' =1/z. This is called an inversion. Example 10.3.2 will show that 
under such a mapping, circles are mapped onto circles or straight lines. 

(d) Combining the preceding three transformations yields the general 
mapping 

5. GEE 
occ td’ 
which is conformal if cz + d 4 0 4 dz'/dz. These conditions are 
equivalent to ad — bc £0. 


(10.8) 


Example 10.3.2 A circle of radius r whose center is at a in the z-plane is 
described by the equation |z — a| = r. When transforming to the z’-plane 
under inversion, this equation becomes |1/z’ — a| =r, or |1 — az’| = 
r|z’|. Squaring both sides and simplifying yields (r* — |a|*)|z//? + 
2Re(az') — 1=0. In terms of Cartesian coordinates, this becomes 


(r? — |a|*) (x? + y?) + 2(a,x’ — ay’) —1=0, (10.9) 
where a = a; + ia;. We now consider two cases: 


1. r+ |a|: Divide by r? — |a|* and complete the squares to get 


x’ + ay an y" qj . a? +a? 1 =0 
rae rP—jaP) (=a 7? = lal? 


or defining 


a, = —a, /(r? - |a\), 


aj =aj/(r?—|al?)_ andr’ =r/|r? = Jal? 


’ 


we have (x’ — a}.)? + (y’ —a})* =r”, which can also be written as 


* 

/ / / / / » / a 

|z -a =r, a = 4, 11d; = —~7—7z- 
lal*—r 
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Fig. 10.3 In the z-plane, we see two equal cylinders whose centers are separated 


This is a circle in the z’-plane with center at a’ and radius of r’. 
2. r=a: Then Eq. (10.9) reduces to a,x’ — ajy’ = 5. which is the equa- 
tion of a line. 


If we use the transformation z’ = 1/(z — c) instead of z’ = 1/z, then 
|z —a| =r becomes |1/z’ — (a —c)| =r, and all the above analysis will go 
through exactly as before, except that a is replaced by a — c. 


Mappings of the form given in Eq. (10.8) are called homographic trans- 
formations. A useful property of such transformations is that they can map 
an infinite region of the z-plane onto a finite region of the z’-plane. In fact, 
points with very large values of z are mapped onto a neighborhood of the 
point z’ = a/c. Of course, this argument goes both ways: Eq. (10.8) also 
maps a neighborhood of —d/c in the z-plane onto large regions of the z’- 
plane. The usefulness of homographic transformations is illustrated in the 
following example. 


Example 10.3.3 Consider two cylindrical conductors of equal radius r, 
held at potentials u; and uz, respectively, whose centers are D units of 
length apart. Choose the x-and the y-axes such that the centers of the cylin- 
ders are located on the x-axis at distances a, and az from the origin, as 
shown in Fig. 10.3. Let us find the electrostatic potential produced by such 
a configuration in the x y-plane. 

We know from elementary electrostatics that the problem becomes very 
simple if the two cylinders are concentric (and, of course, of different radii). 
Thus, we try to map the two circles onto two concentric circles in the z’- 
plane such that the infinite region outside the two circles in the z-plane gets 
mapped onto the finite annular region between the two concentric circles in 
the z’-plane. We then (easily) find the potential in the z’-plane, and transfer 
it back to the z-plane. 

The most general mapping that may be able to do the job is that given 
by Eq. (10.8). However, it turns out that we do not have to be this general. 
In fact, the special case z’ = 1/(z — c) in which c is a real constant will 
be sufficient. So, z = (1/z’) +c, and the circles |z — agz| =r for k = 1,2 
will be mapped onto the circles |z’ — a,| =1r,, where (by Example 10.3.2) 
ay, = (ax — c)/[(ag — c)? — r?] and r, =r/\(ag — c)? — r? |. 
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Fig. 10.4 In the z’-plane, we see two concentric unequal cylinders 


Can we arrange the parameters so that the circles in the z’-plane are con- 
centric, i.e., that aj = a? The answer is yes. We set a} = a‘, and solve 
for az in terms of a,. The result is either the trivial solution az = a1, or 
a2 =c —r?/(a, — c). If we place the origin of the z-plane at the center of 
the first cylinder, then a = 0 and ay = D=c+r?/c. We can also find a} 
and a}: a, =a’, =a’ =—c/(c? —r”), and the geometry of the problem is as 
shown in Fig. 10.4. 

For such a geometry the potential at a point in the annular region is given 
by 

®'= Alnp+ B= Aln|z’ —a'| +B, 
where A and B are real constants determined by the conditions &’ (r}) =U\ 


and ®'(r5) = uz, which yields 


uy — U2 ua inr| — uy Inrt 
= 17)! and B= ) 17,1 : 
In(r} /r>) In(r} /T>) 


The potential ®’ is the real part of the complex function® 
F(z’) =A In(z’ - a’) +B, 


which is analytic except at z’ = a’, a point lying outside the region of in- 
terest. We can now go back to the z-plane by substituting z’ = 1/(z —c) to 
obtain 


1 
G(z) = Atn( —— -«') + B, 
zc 


whose real part is the potential in the z-plane: 


l—d’ / 
az+ac 4B 


P(x, y) =Re[G(z)] = Ain 
Z—C 


8Writing z = |zle’’, we note that Inz = In|z| + 10, so that the real part of a complex log 
function is the log of the absolute value. 
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=Aln +B 


[pte 
(x —c)+1y 

_ Ae (l+a'c—a'x)+a”y? ee 
2 (x—c)* + y? 


This is the potential we want. 


10.4 Integration of Complex Functions 


The derivative of a complex function is an important concept and, as the 
previous section demonstrated, provides a powerful tool in physical appli- 
cations. The concept of integration is even more important. In fact, we will 
see in the next section that derivatives can be written in terms of integrals. 
We will study integrals of complex functions in detail in this section. 

The definite integral of a complex function is defined in analogy to that 
of a real function: 


‘s N 
[ sous slim, 2 FG) Aer 


Az; > 0 i=l 


where Az; is a small segment, situated at z;, of the curve that connects the 
complex number a, to the complex number @2 in the z-plane. Since there 
are infinitely many ways of connecting a; to a2, it is possible to obtain 
different values for the integral for different paths. Before discussing the 
integral itself, let us first consider the various kinds of path encountered in 
complex analysis. 


1. Acurve is amap y : [a, b] > C from the real interval into the complex curve, simple arc, path, 
plane given by y(t) = y,(t) +iy;(t), where a <t <b, and y, and y; and smooth arc defined 
are the real and imaginary parts of y; y (a) is called the initial point of 
the curve and y ()) its final point. 

2. A simple arc, or a Jordan arc, is a curve that does not cross itself, i.e., 

y is injective (or one to one), so that y(t1) 4 y (#2) when t) # ho. 

3. A path is a finite collection {y1, y2,..., ¥,} of simple arcs such that 
the initial point of yz; coincides with the final point of yx. 

4. A smooth arc is a curve for which dy/dt = dy,/dt + idy;/dt exists 
and is nonzero for ¢ € [a, b]. 

5. Acontour is a path whose arcs are smooth. When the initial point of y; contour defined 
coincides with the final point of y,, the contour is said to be a simple 
closed contour. 


The path dependence of a complex integral is analogous to the line inte- 
gral of a vector field encountered in vector analysis. In fact, we can turn the 
integral of a complex function into a line integral as follows. We substitute 
f(z) =u +iv and dz = dx + idy in the integral to obtain 


[ soa f udx—vdy) +i f Ode Ray, 
Qa ay eal 
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(1,2) 


Fig. 10.5 The three different paths of integration corresponding to the integrals /,, Ij, 
Jy, and 15 


If we define the two-dimensional vectors Ay = (u, —v) and Ay = (v, u), we 


get 
a2 a2 a2 
/ foac= f Aydr+i f A> - dr. 
ay ay ay} 


It follows from Stokes’ theorem (or Green’s theorem, since the vectors lie 
in a plane) that the integral of f is path-independent only if both A; and Az 
have vanishing curls. This in turn follows if and only if u and v satisfy the 
C-R conditions, and this is exactly what is needed for f(z) to be analytic. 

Path-independence of a line integral of a vector A is equivalent to the 
vanishing of the integral along a closed path, and the latter is equivalent to 
the vanishing of V x A= 0 at every point of the region bordered by the 
closed path. In the case of complex integrals, this result is stated as 


Theorem 10.4.1 (Cauchy-Goursat theorem) Let f :C — C be analytic on 
a simple closed contour C and at all points inside C. Then 


§ f(@dz=0. 
Cc 


Example 10.4.2 (Examples of definite integrals) 


(a) Let us evaluate the integral 1; = ioe zdz where yy is the straight line 


drawn from the origin to the point (1, 2) (see Fig. 10.5). Along such a 
line y = 2x and, using ¢ for x, yj (t) =t + 2it where 0 <t < 1; s0 


1 
n= edz = | (t + 2it)(dt + 2idt) 
v1 0 
: 3 
= (—3tdt + 4itdt) = —~ + 2i. 
0 2 


For a different path y2, along which y = 2x”, we get y2(t) = t + 2it? 
where 0 < ¢ < 1, and 


1 
3 
n= zdz= [ (t + 2it*) (dt + 4itdt) = —= + 2i. 
y2 0 2 
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Fig. 10.6 The two semicircular paths for calculating J; and I; 


(b) 


(c) 


Therefore, J; = J;. This is what is expected from the Cauchy-Goursat 
theorem because the function f(z) = z is analytic on the two paths 
and in the region bounded by them. 

To find Ih = i z2dz with y as in part (a), substitute for z in terms 
of ft: 


2 ee ie 11 2. 
Ih= | (t+ 2it)*(dt + 2idt) = (1 + 27) t‘dt =—-— — xi. 
"1 0 3 3 
Next we compare Jy with I, = i z’dz where y3 is as shown in 
Fig. 10.5. This path can be described by 


t forO<t <1, 
v3(t) = : 
1+i@=1) fori<#=<3. 
Therefore, 
! 3 1 11 
=] eat ime i(t-1)]*Gdt) =~ -4-<i= 
=f ay cea J Gat) = 5 5 a a 


which is identical to /,, once again because the function is analytic on 
y, and y3 as well as in the region bounded by them. 
Now consider 23 = iis dz/z where y4 is the upper semicircle of unit 


radius, as shown in Fig. 10.6. A parametric equation for y4 can be 
given in terms of 0: 


y4(0) =cosO+isind=e'® = dz=iel’dd, 0<O<n. 
Thus, we obtain 
TY . 
h= f — ie" d0 =in. 
0 @ 


On the other hand, for Vas the lower semicircle of unit radius, we get 
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Fig. 10.7. A complicated contour can be broken up into simpler ones. Note that the 
boundaries of the “eyes” and the “mouth” are forced to be traversed in the (negative) 
clockwise direction 


Here the two integrals are not equal. From y4 and y; we can construct 
a counterclockwise simple closed contour C, along which the integral 
of f(z) = 1/z becomes fc dz/z = 13 — I, = 2im. That the integral 
is not zero is a consequence of the fact that 1/z is not analytic at all 
points of the region bounded by the closed contour C. 


The Cauchy-Goursat theorem applies to more complicated regions. 
When a region contains points at which f(z) is not analytic, those points 
can be avoided by redefining the region and the contour. Such a procedure 
requires an agreement on the direction we will take. 


Convention When integrating along a closed contour, we agree to move 
along the contour in such a way that the enclosed region lies to our left. An 
integration that follows this convention is called integration in the positive 
sense. Integration performed in the opposite direction acquires a minus sign. 


For a simple closed contour, movement in the counterclockwise direction 
yields integration in the positive sense. However, as the contour becomes 
more complicated, this conclusion breaks down. Figure 10.7 shows a com- 
plicated path enclosing a region (shaded) in which the integrand is analytic. 
Note that it is possible to traverse a portion of the region twice in opposite 
directions without affecting the integral, which may be a sum of integrals for 
different pieces of the contour. Also note that the “eyes” and the “mouth” 
are traversed clockwise! This is necessary because of the convention above. 
A region such as that shown in Fig. 10.7, in which holes are “punched out’, 
is called multiply connected. In contrast, a simply connected region is one 
in which every simple closed contour encloses only points of the region. 

One important consequence of the Cauchy-Goursat theorem is the fol- 
lowing: 


10.4 Integration of Complex Functions 313 


Fig. 10.8 The integrand is analytic within and on the boundary of the shaded region. It 
is always possible to construct contours that exclude all singular points 


Theorem 10.4.3 (Cauchy integral formula) Let f be analytic on and 
within a simple closed contour C integrated in the positive sense. Let 
zo be any interior point to C. Then 


1 
ene) 


- dz 
2ni Jo Z— Zo 


To prove the Cauchy integral formula (CIF), we need the following 
lemma. 


Lemma 10.4.4 (Darboux inequality) Suppose f :C — C is continuous 
and bounded on a path y, i.e., there exists a positive number M such that 
| f(z)| < M for all values z € y. Then 


/ f(z)dz 
y 


where L,, is the length of the path of integration. 


<MLy, 


Proof See Problem 10.27. 
Now we are ready to prove the Cauchy integral formula. 


Proof of CIF Consider the shaded region in Fig. 10.8, which is bounded 
by C, by y (a circle of arbitrarily small radius 6 centered at zo), and by Ly 
and L», two straight line segments infinitesimally close to one another (we 
can, in fact, assume that L; and L2 are right on top of one another; however, 
they are separated in the figure for clarity). Let C’ =C Uyp UL; UL2. 

Since f (z)/(z — zo) is analytic everywhere on the contour C’ and inside 
the shaded region, we can write 


pe f(z) i f @) d+ f f(@) de (10.10) 
y 


Cc’ Z—Z0 czZ—2 Z—Z 


314 


10 Complex Calculus 


because the contributions from L; and L2 cancel. Let us evaluate the con- 
tribution from the infinitesimal circle yo. First we note that because f(z) is 
continuous (differentiability implies continuity), we can write 


f(z) — fo) 7 If(z) — fol o If(z) — f Zo)| a 


Z—Z0 |Z — Zo| 5 5 


for z € yo, where € is a small positive number. We now apply the Darboux 
inequality and write 


f(z) — f Zo) a 


< «ond = 27. 
y» *%—0 ) 


This means that the integral goes to zero as 6 > 0, or 


f(z) = f (Zo) n=IGe dz 
wy 2 — 20 vy % — 20 vy < — 20 


We can easily calculate the integral on the RHS by noting that z — zo = de!” 
and that yo has a clockwise direction: 


20 is ip q 
f dz | EOD ipo fas § AE erent 
we 0 Yo 


— 2 bel? Z— 20 


Substituting this in (10.10) yields the desired result. 


Example 10.4.5 We can use the CIF to evaluate the integrals 


§ zdz f (22 — 1)dz 
q => “ad Ih = 7; 
q Far G=1) Co (z — 5)(z2 — 4)3 


e/2dz 
b =f ; ’ 
Cc; (z —im)(z? — 20)4 


where C1, C2, and C3 are circles centered at the origin with radii r; = 3/2, 
2> 1, and 3 > 4. 

For J, we note that f(z) = z / (22 +3)? is analytic within and on C), and 
Zo = i lies in the interior of C;. Thus, 


f (z)dz ee . 1 
1 ere if (i) wi (432 iS 


Similarly, f(z) = (<? — 1)/(z? — 4)? for the integral J) is analytic on and 
within C2, and zo = 1/2 is an interior point of C2. Thus, the CIF gives 


d 32 
pad 2% miu 4 
a 1125 


For the last integral, f(z) = e®/*/(z* — 20)*, and the interior point is 
ZQ= 10: 
f (dz 2 


ne . 
pap Gao 


B= 


10.5 Derivatives as Integrals 


The Cauchy integral formula gives the value of an analytic function at 
every point inside a simple closed contour when it is given the value of 
the function only at points on the contour. It seems as though an analytic 
function is not free to change inside a region once its value is fixed on the 
contour enclosing that region. 

There is an analogous situation in electrostatics: The specification of the 
potential at the boundaries, such as the surfaces of conductors, automatically 
determines the potential at any other point in the region of space bounded 
by the conductors. This is the content of the uniqueness theorem used in 
electrostatic boundary value problems. However, the electrostatic potential 
@ is bound by another condition, Laplace’s equation; and the combination 
of Laplace’s equation and the boundary conditions furnishes the uniqueness 
of ®. Similarly, the real and imaginary parts of an analytic function sepa- 
rately satisfy Laplace’s equation in two dimensions! Thus, it should come 
as no surprise that the value of an analytic function on a boundary (contour) 
determines the function at all points inside the boundary. 


10.5 Derivatives as Integrals 


The Cauchy Integral Formula is a powerful tool for working with analytic 
functions. One of the applications of this formula is in evaluating the deriva- 
tives of such functions. It is convenient to change the dummy integration 
variable to € and write the CIF as 


1 d 
sou dg 104 


= , 10.11 
20i Cc E-Zz ( ) 


where C is a simple closed contour in the €-plane and z is a point within C. 
As preparation for defining the derivative of an analytic function, we need 
the following result. 


Proposition 10.5.1 Let y be any path—a contour, for example—and g a 
continuous function on that path. The function f (z) defined by 


1 d 
fa=z f © g 
Hi 


2ri Bg 
is analytic at every point z ¢ y. 


Proof The proof follows immediately from differentiation of the integral: 


df 1d f g&)dé 
dz 2nidzJ, —-z 


es Ce ee a g(&) dg 
Oni [oats —) ~ Oni [ (§ — z)?" 
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This is defined for all values of z not on y.? Thus, f(z) is analytic there. 


We can generalize the formula above to the nth derivative, and obtain 


a"f _ ni g(&) dé 


dz" 2ni J, €—ortl 


Applying this result to an analytic function expressed by Eq. (10.11), we 
obtain the following important theorem. 


derivative of an analytic Theorem 10.5.2 The derivatives of all orders of an analytic function f (z) 
function giveninterms exist in the domain of analyticity of the function and are themselves analytic 
of anintegral in that domain. The nth derivative of f (z) is given by 


a"f_ on § SE) dé 


dz oni C (é — zyetl’ 


fC _ (10.12) 


Example 10.5.3 Let us apply Eq. (10.12) directly to some simple func- 
tions. In all cases, we will assume that the contour is a circle of radius r 
centered at z. 

(a) Let f(z) = K, aconstant. Then, for n = 1 we have 


df — Lg Kdé 
c (& —z)? 


dz. 2mi 


Since & is on the circle C centered at z,  —z =re!? and dé =rie'"d0. So 
we have 


e '°do =0. 


df 1 ia Kire'°d0 = -K O*™ 
dz 2zi Jo (re)? 2nr Jo 


(b) Given f(z) = z, its first derivative will be 


df _ i¢ Edé 1 i_ (z +re')ire!"dé 
0 


dz 2miJco(E—z)?  2ni (rei9)2 


1 Zz 20 ; 20 1 
-~(:/ cao + | do) = S420) =1, 
2a \r 0 0 20 


(c) Given f(z) = z’, for the first derivative, Eq. (10.12) yields 


df 1 f ede | i (2 +re!®)ire!"d0 
dz 2niJc(E—z)?  2zi Jo (rei?)2 
1 20 


= 5 Fs + (re’?)? + 2zre!](re’) "do 
0 


°The interchange of differentiation and integration requires justification. Such an inter- 
change can be done if the integral has some restrictive properties. We shall not concern 
ourselves with such details. In fact, one can achieve the same result by using the definition 
of derivatives and the usual properties of integrals. 


10.5 Derivatives as Integrals 


1 2 20 : 2n : 2n 
=— (=f e ’do+ rf edo + 2: | w) =2z. 
Qn r 0) 0 0 


It can be shown that, in general, (d/dz)z” = mz’"—!. The proof is left as 
Problem 10.30. 


The CIF is a central formula in complex analysis, and we shall see its 
significance in much of the later development of complex analysis. For now, 
let us demonstrate its usefulness in proving a couple of important properties 
of analytic functions. 


Proposition 10.5.4 The absolute value of an analytic function f (z) cannot 
have a local maximum within the region of analyticity of the function. 


Proof Let S Cc C be the region of analyticity of f and zo a point in S. Let 
yo be a circle of radius 6 in S, centered at zo. Using the CIF, and noting that 
Z-Z= 5e!9, we have 


1 f(z) | 1 
- dz|= 
27 Jy) Z— 20 


ot Fg) 


0 be? 


| f (zo)| = 


= isa] 
Qn 


1 20 1 20 
= / |f@|d0 <= J | f Gmax)|d0 = | f Emax)), 
Tw Jo 20 0 
where Zmax is where the maximum value of | f(z)| occurs on yo. This in- 
equality says that for any point zo that one picks, there is always another 
point which produces a larger absolute value for f. Therefore, there can be 
no local maximum within S. 


Proposition 10.5.5 A bounded entire function is necessarily a constant. 


Proof We show that the derivative of such a function is zero. Consider 
df 1 f f(E)dé 
dz 2ni Jc (§ —z)*? 
Since f is an entire function, the closed contour C can be chosen to be a 


very large circle of radius R with center at z. Taking the absolute value of 
both sides yields 


d 1 20 : 
fy / I@) i rei®ag 
dz 2m \Jo (Reif)? 
1 20 1 20 M M 
2 / FOI 6 < ee 
20 0 R 20 0 R R 


where M is the maximum of the function in the complex plane. Now, as 
R— o, the derivative goes to zero, and the function must be a constant. 


Proposition 10.5.5 is a very powerful statement about analytic functions. 
There are many interesting and nontrivial real functions that are bounded 
2 

and have derivatives of all orders on the entire real line. For instance, e~* 
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is such a function. No such freedom exists for complex analytic functions. 
Any nontrivial analytic function is either not bounded (goes to infinity some- 
where on the complex plane) or not entire (it is not analytic at some point(s) 
of the complex plane). 

A consequence of Proposition 10.5.5 is the following 


Theorem 10.5.6 (Fundamental theorem of algebra) Any polynomial 
P(x) =ag + ayx +-+-+anx", an 40 
can be factored completely as 


P(X) = Gn (x — 21)(% — 22) ++» (&% — Zn), 


where the z; are complex numbers. 


Proof Let f(z) = 1/p(z) and assume the contrary, i.e., that p(z) is never 
zero for any (finite) z € C. Then f(z) is bounded and analytic for all z € C, 
and Proposition 10.5.5 says that f(z) is a constant. This is obviously wrong 
if n > 0. Thus, there must be at least one z, say z = z1, for which p(z) is 
zero. So, we can factor out (z — z1) from p(z) and write p(z) = (z— z1)q(z) 
where q(z) is of degree n — 1. Applying the above argument to q(z), we have 
D(z) = (2 — 21) (z — Z2)r(z) where r(z) is of degree n — 2. Continuing in 
this way, we can factor p(z) into linear factors. The last polynomial will be 
a constant (a polynomial of degree zero) which has to be equal to a, to make 
the coefficient of z” equal to the original polynomial. 


The primitive (indefinite integral) of an analytic function can be defined 
using definite integrals just as in the real case. Let f : C > C be analytic 
in a region S of the complex plane. Let zo and z be two points in S, and 
define!® F(z) = hh f (E) dé. We can show that F(z) is the primitive of f (z) 
by showing that 


yr cn CS 


Az—>0 Az 


We leave the details as a problem for the reader. 


Proposition 10.5.7 Let f :C — C be analytic in a region S of C. Then at 
every point z € S, there exists an analytic function F :C — C such that 


dF 


i 


'ONote that the integral is path-independent due to the analyticity of f. Thus, F is well- 
defined. 


10.6 Infinite Complex Series 


In the sketch of the proof of Proposition 10.5.7, we used only the continu- 
ity of f and the fact that the integral was well-defined. These two conditions 
are sufficient to establish the analyticity of F and f, since the latter is the 
derivative of the former. The following theorem, due to Morera, states this 
fact and is the converse of the Cauchy-Goursat theorem. 


Theorem 10.5.8 (Morera’s theorem) Leta function f :C — C be continu- 
ous in a simply connected region S. If for each simple closed contour C in 
S we have fc Sf (&) dé =0, then f is analytic throughout S. 


10.6 Infinite Complex Series 


The expansion of functions in terms of polynomials or monomials is impor- 
tant in calculus and was emphasized in Chaps. 7 and 8. We now apply this 
concept to analytic functions. 


10.6.1 Properties of Series 


Complex series are very similar to real series with which the reader is as- 
sumed to have some familiarity. Therefore, we state (without proof) the most 
important properties of complex series before discussing the quintessential 
Taylor and Laurent series. 

A complex series is said to converge absolutely if the real series 


CO [o,@) 
> Ize] = bae + yg 
k=0 k=0 


converges. Clearly, absolute convergence implies convergence. 


Proposition 10.6.1 If the power series S° °°.) ax(z — zo)* converges for 
Z1 # zo, then it converges absolutely for every value of z such that |z — zo| < 
|z1 — zo|. Similarly if the power series \ 7°.) bk/(z — zo)* converges for 
Z2 # zo, then it converges absolutely for every value of z such that |z — zo| > 
|z2 — zol- 


A geometric interpretation of this proposition is that if a power series— 
with positive powers—converges for a point at a distance r; from Zo, then it 
converges for all interior points of the circle whose center is zo, and whose 
radius is 7. Similarly, if a power series—with negative powers—converges 
for a point at a distance rz from zo, then it converges for all exterior points of 
the circle whose center is z9 and whose radius is rz (see Fig. 10.9). Generally 
speaking, positive powers are used for points inside a circle and negative 
powers for points outside it. 

The largest circle about zo such that the first power series of Proposi- 
tion 10.6.1 converges is called the circle of convergence of the power series. 
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(a) (b) 


Fig. 10.9 (a) Power series with positive exponents converge for the interior points of 
a circle. (b) Power series with negative exponents converge for the exterior points of a 
circle 


The proposition implies that the series cannot converge at any point outside 
the circle of convergence. 
In determining the convergence of a power series 


CO 


S(z) = an(z — 20)". (10.13) 


n=0 


we look at the behavior of the sequence of partial sums 


N 
Sw) = > an(z - 20)". 


n=0 


Convergence of (10.13) implies that for any ¢ > 0, there exists an integer 
N, such that 


|S(z) — Sn (z)| <e whenever N > N,. 


In general, the integer VN, may be dependent on z; that is, for different values 
of z, we may be forced to pick different N,’s. When Nz is independent of z, 
we say that the convergence is uniform. 


Theorem 10.6.2 The power series S(z) = pee, an(z — zo)” is uni- 
formly convergent for all points within its circle of convergence and 
represents a function that is analytic there. 


By substituting the reciprocal of (z— zo) in the power series, we can show 
that if S79 be /(z — zo)* is convergent in the annulus rz < |z — zol <1, 
then it is uniformly convergent for all z in that annulus. 


Theorem 10.6.3 A convergent power series can be differentiated and inte- 
grated term by term; that is, if S(z) = ae An(Z — 20)", then 
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“2 = > nan(z — 20)", [seae= oa [e-aotas 


n=1 


for any path y lying in the circle of convergence of the power series. 


10.6.2 Taylor and Laurent Series 


We now state and prove the two main theorems of this section. A Taylor 
series consists of terms with only positive powers. A Laurent series allows 
for negative powers as well. 


Theorem 10.6.4 (Taylor series) Let f be analytic throughout the interior Taylor series 


of a circle Co having radius ro and centered at zo. Then at each point z 
inside Co, 


z—zo)"”. (10.14) 


fren) 


nN. 


f= feo) + f' (zo)(z-— 2) $= ~ 
n=0 


Proof From the CIF and the fact that z is inside Co, we have 


1 
w= FE) 


~ Oni G6 —Z% 


dé. 


On the other hand, 
1 1 1 


E-z2 §-zteo-z Ew) —- £2) 


tll i. 1 eo 
6 = al 28 ~ =a \ 5 =a) 


E&—z0 0 


The last equality follows from the fact that |(z — zo) /(& — zo)| < 1—because 
z is inside the circle Co and & is on it—and from the sum of a geometric 
series. Substituting in the CIF and using Theorem 10.5.2, we obtain the 


result. 
For zo = 0 we obtain the Maclaurin series: Maclaurin series 
OO ¢(n) 0 
fO=fO+ fOrt--= LO. 
n=0 ie 


The Taylor expansion requires analyticity of the function at all points 
interior to the circle Co. On many occasions there may be a point inside Co 
at which the function is not analytic. The Laurent series accommodates such 
cases. 


Theorem 10.6.5 (Laurent series) Let C, and C2 be circles of radii r, Laurent series 
and r2, both centered at zp in the z-plane with r; > r2. Let f:C>C 
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Fig. 10.10 The annular region within and on whose contour the expanded function is 
analytic 


be analytic on C, and C2 and throughout S, the annular region between the 
two circles. Then, at each point z € S, f (z) is given by 


[ee 


f (2) = ye ay (Zz — zo)” where dn = 1 § f€) dé 


2ni Io (& —z9)"*" 


n=—C 
and C is any contour within S that encircles zo. 
Proof Let y be a small closed contour in S enclosing z, as shown in 


Fig. 10.10. For the composite contour C’ = C; U C2 U y, the Cauchy- 
Goursat theorem gives 


_ FS) oe) f) f&) f&) 
Omg Dorde=p eras ise pre 


where the y and C» integrations are negative because their interior lies to 
our right as we traverse them. The y integral is simply 27ri f(z) by the CIF. 
Thus, we obtain 


ff) dé I) 


c§-Z O§-Z 


2mif (z) = é. (10.15) 
Now we use the same trick we used in deriving the Taylor expansion. Since 
z is located in the annular region, r2 < |z — zo| < 71. We have to keep this 
in mind when expanding the fractions. In particular, for § € Cy we want 
the € term to be in the denominator, and for € € C2 we want it to be in the 
numerator. Substituting such expansions in Eq. (10.15) yields 


2nif (z) = xe - coy" ai Gas 


c, € — 2)" 


n=0 
[o.@) 1 , 
+e f FOE -) dé. (10.16) 


n=0 
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C, 


Fig. 10.11 The arbitrary contour in the annular region used in the Laurent expansion. 
The break in C and the gap in the shaded region are magnified for visual clarity 


Now we consider an arbitrary contour C in S that encircles zo. Fig- 
ure 10.11 shows a region bounded by a contour composed of C, and C. 
In this region f (€)/(€ — zo)tt! is analytic (because € can never equal zo). 
Thus, the integral over the composite contour must vanish by the Cauchy- 
Goursat theorem. It follows that the integral over C) is equal to that over C. 
A similar argument shows that the C2 integral can also be replaced by an 
integral over C. We let n + 1 = —m in the second sum of Eq. (10.16) to 
transform it into 


—oo 


1 —m—1 
> cow $ SOE-2») dé 


m=—1 
= 

f(E) dé 
m a a ong (§ — (E—zoymtl 


Changing the dummy index back to n and substituting the result in 
Eq. (10.16) yields 


2nife)= Dee z op €— HS) dé 


(E — zt 
n=—1 


n f(&) 
7 2X, &- ao teat on 


We can now combine the sums and divide both sides by 277i to get the 
desired expansion. 


The Laurent expansion is convergent as long as r2 < |z — zo| <1. In 
particular, if r2 = 0, and if the function is analytic throughout the interior of 
the larger circle, then a, will be zero forn = —1, —2,... because f(&)/(€ — 
zo)"+! will be analytic for negative n, and the integral will be zero by the 
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Cauchy-Goursat theorem. Thus, only positive powers of (z — Zo) will be 
present in the series, and we recover the Taylor series, as we should. 

It is clear that we can expand C, and shrink C until we encounter a point 
at which f is no longer analytic. This is obvious from the construction of the 
proof, in which only the analyticity in the annular region is important, not 
its size. Thus, we can include all the possible analytic points by expanding 
C, and shrinking C2. 


Example 10.6.6 Let us expand some functions in terms of series. For an 
entire function there is no point in the entire complex plane at which it is 
not analytic. Thus, only positive powers of (z — zo) will be present, and we 
will have a Taylor expansion that is valid for all values of z. 


(a) Let us expand e* around zo = 0. The nth derivative of e* is e*. Thus, 
f™ (0) = 1, and Taylor (Maclaurin) expansion gives 


Cc [o,@) 
z 7) 5 4 
Dg oe af 
n=0 n=0 
(b) The Maclaurin series for sin z is obtained by noting that 
a” | 0 if n is even, 
sin z = 
dz" 0 | (DY? ifnis odd 


and substituting this in the Maclaurin expansion: 


( ye" oo ‘ zek+l 
et anti _ 
sinz= ) | (-1) nl ==) Qk+D!" 
n odd k=0 
Similarly, we can obtain 
oo ; 72k oo tht 
= > (-1* —, inhz = § \—~—_—__., 
COS Z di Y Ob! sinh z 2 Get) 
CO 2k 
coshz = a 
tao | )! 


(c) The function 1/(1+ z) is not entire, so the region of its convergence 
is limited. Let us find the Maclaurin expansion of this function. The 
function is analytic within all circles of radii r < 1. Atr = 1 we en- 
counter a singularity, the point z = —1. Thus, the series converges for 
all points!! z for which |z| < 1. For such points we have 


qd” 
1 =i =(-1)"n!. 
zal +2 ll es (-1)"n 


fO= 


'l As remarked before, the series diverges for all points outside the circle |z| = 1. This 
does not mean that the function cannot be represented by a series for points outside the 
circle. On the contrary, we shall see shortly that Laurent series, with negative powers of 
Z— Zo are designed precisely for such a purpose. 
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Thus, 


CO (n) foe) 
1 _ ‘se f ie yer" 


1+z ! 
A n=0 n=0 


Taylor and Laurent series allow us to express an analytic function as a 
power series. For a Taylor series of f(z), the expansion is routine because 
the coefficient of its nth term is simply f(zo)/n!, where zo is the center of 
the circle of convergence. When a Laurent series is applicable, however, the 
nth coefficient is not, in general, easy to evaluate. Usually it can be found 
by inspection and certain manipulations of other known series. But if we 
use such an intuitive approach to determine the coefficients, can we be sure 
that we have obtained the correct Laurent series? The following theorem 
answers this question. 


Theorem 10.6.7 [f the series Sas An(z — zo)” converges to f (z) 
at all points in some annular region about zo, then it is the unique 
Laurent series expansion of f (z) in that region. 


Proof Multiply both sides of f(z) = 7-4, an(z — zo)" by 


1 
Qwi(z — zp)k+1’ 


integrate the result along a contour C in the annular region, and use the 
easily verifiable fact that 


1 § dz a 
Qi Cc (z— zo)k-at1 = 


to obtain 


: § F®) dz=adk. 


Qi Cc (z = zo)kt1 


Thus, the coefficient in the power series of f is precisely the coefficient in 
the Laurent series, and the two must be identical. 


We will look at some examples that illustrate the abstract ideas developed 
in the preceding collection of theorems and propositions. However, we can 
consider a much broader range of examples if we know the arithmetic of 
power series. The following theorem about arithmetical manipulations with 
power series is not difficult to prove (see [Chur 74]). 


Theorem 10.6.8 Let the two power series 


[o,@) [o,e) 


F@= DS anle—z)" and g@)= J) bn@— 2)" 


n=—Oo n=—C} 
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be convergent within some annular region rz < |z — Z| <1. Then 


f@+8@)= D> Gntbn)(z— 20)" 
and 
fOs@= Yo YS anbme— 20)" = Yo cx(z— 20)" 
n=—OO M=—CO k=—0o 


for z interior to the annular region. Furthermore, if g(z) 4 0 for some neigh- 
borhood of zo, then the series obtained by long division of the first series by 
the second converges to f (z)/g(z) in that neighborhood. 


This theorem, in essence, says that converging power series can be ma- 
nipulated as though they were finite sums (polynomials). Such manipula- 
tions are extremely useful when dealing with Taylor and Laurent expansions 
in which the straightforward calculation of coefficients may be tedious. The 
following examples illustrate the power of infinite-series arithmetic. 


Example 10.6.9 To expand the function f(z) = 535 in a Laurent series 
about z = 0, rewrite it as , 


1 (24+3z 1 1 1 = ag 
p= 3( =) = (3 )=al? 2 ute 
1 


=5(3-14+2-74+2 is) 


a. ‘ 
bo HlteH ten. 
& 


NX 


Zz 


This series converges for 0 < |z| < 1. We note that negative powers of z are 
also present.!? Using the notation of Theorem 10.6.5, we have a, = 0 for 
n <—3,a_2 =2,a_, = 1, and a, = (—1)"*! forn>0. 


Example 10.6.10 The function f(z) = 1/(4z—z7) is the ratio of two entire 
functions. Therefore, by Theorem 10.6.8, it is analytic everywhere except at 
the zeros of its denominator, z = 0 and z = 4. For the annular region (here 
ro of Theorem 10.6.5 is zero) 0 < |z| < 4, we expand f(z) in the Laurent 
series around z = 0. Instead of actually calculating ay, we first note that 


11 


The second factor can be expanded in a geometric series because |z/4| < 1: 


1 = Zz : = —n LN 
Pah) eee 


n=0 n=0 


This is a reflection of the fact that the function is not analytic inside the entire circle 
|z| = 1; it blows up at z= 0. 
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Dividing this by 4z, and noting that z = 0 is the only zero of 4z and is 
excluded from the annular region, we obtain the expansion 


n 


oo z oo 
_ —n _ —n—-1_n-1 
f@=>-4 ag a os 


n=0 


Although we derived this series using manipulations of other series, the 
uniqueness of series representations assures us that this is the Laurent se- 
ries for the indicated region. 

How can we represent f(z) in the region for which |z| > 4? This region 
is exterior to the circle |z| = 4, so we expect negative powers of z. To find 
the Laurent expansion we write 


ol 1 
10=-23( qa) 


and note that |4/z| < 1 for points exterior to the larger circle. The second 
factor can be written as a geometric series: 


I — 4\" _ n,—n 
a= d(5) =e 


n=0 n=0 


Dividing by —z”, which is nonzero in the region exterior to the larger circle, 
yields 
CO 


f@=-yoarer™. 


n=0 


Example 10.6.11 The function f(z) = z/[(z — 1)(z — 2)] has a Taylor ex- 
pansion around the origin for |z| < 1. To find this expansion, we write!? 


ieee 1 
wT eal @ao tae tee 


Expanding both fractions in geometric series (both |z| and |z/2| are less 
than 1), we obtain f(z) = )>- 9 2" — 0° 9 (z/2)”. Adding the two series— 
using Theorem 10.6.8—yields 


CO 


f@= YU —2™")z" for |z| <1. 


n=0 


This is the unique Taylor expansion of f(z) within the circle |z| = 1. 
For | < |z| <2 we have a Laurent series. To obtain this series, write 


bi le — ( 1 ) 1 
PO ped tage ehis te) toes 


!3We could, of course, evaluate the derivatives of all orders of the function at z = 0 and 
use Maclaurin’s formula. However, the present method gives the same result much more 
quickly. 
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Since both fractions on the RHS converge in the annular region (|1/z| < 1, 
|z/2| < 1), we get 


fo E() BG Boe 
n=0 


n=0 n=0 n=0 
—o0o (oe) CO 
a- Sets YS ae" 
n=—-1 n=0 n=—0o 
where a, = —1 for n < 0 and a, = —2~” for n > 0. This is the unique 


Laurent expansion of f(z) in the given region. 
Finally, for |z| > 2 we have only negative powers of z. We obtain the 
expansion in this region by rewriting f(z) as follows: 
1/z 2/Z 


I= Fe 1D 


Expanding the fractions yields 


[ee] 


Seed fore) 
f= Sho aio (ott ye 
n=0 n=0 


n=0 


This is again the unique expansion of f(z) in the region |z| > 2. 
Example 10.6.12 Define f(z) as 


_ |G —cosz)/z? for z 40, 
7 5 for z=0. 


f@) 


We can show that f(z) is an entire function. 

Since | — cosz and z” are entire functions, their ratio is analytic every- 
where except at the zeros of its denominator. The only such zero is z = 0. 
Thus, Theorem 10.6.8 implies that f(z) is analytic everywhere except pos- 
sibly at z = 0. To see the behavior of f(z) at z= 0, we look at its Maclaurin 
series: 


o° 2n 
Zz 
fee _ _ n 
1—cosz=1—S°(-1) ont 
n=0 
which implies that 
1 —cosz ay pe = 1 2 " 74 
z (Qn)! 2 4! 6! 


n=1 


The expansion on the RHS shows that the value of the series at z = 0 is 
7 which, by definition, is {(0). Thus, the series converges for all z, and 
Theorem 10.6.2 says that f(z) is entire. 


A Laurent series can give information about the integral of a function 
around a closed contour in whose interior the function may not be analytic. 
In fact, the coefficient of the first negative power in a Laurent series is given 


10.6 Infinite Complex Series 
by 


a1 => sof FE)dé. (10.17) 
II 


Thus, to find the integral of a (nonanalytic) function around a closed contour 
surrounding zo, we write the Laurent series for the function and read off the 
coefficient of the 1/(z — zo) term. 


Example 10.6.13 As an illustration of this idea, let us evaluate the integral 
l= $c dz/ [z7(z — 2)], where C is the circle of radius 1 centered at the 
origin. The function is analytic in the annular region 0 < |z| < 2. We can 
therefore expand it as a Laurent series about z = 0 in that region: 


1 1 iL LS z\" 
@G=2) W\I=72)° Wee 


eI 1/1 1 

ae ae 8 
Thus, a_; = — zeand f¢. dz/[ [z?(z—2)] = 2mia_| = —im/2. A direct evalu- 
ation of the ‘itepral is nontrivial. In fact, we will see later that to find certain 


integrals, it is advantageous to cast them in the form of a contour integral 
and use either Eq. (10.17) or a related equation. 


Let f :C > C be analytic at zo. Then by definition, there exists a neigh- 
borhood of zg in which f is analytic. In particular, we can find a circle 
|Z — Zo| =r > 0 in whose interior f has a Taylor expansion. 


Definition 10.6.14 Let 


jee “Co =) eae" 


n=0 n=0 


Then f is said to have a zero of order k at zo if f (zo) = 0 for n = 
0,1,...,4—1 but f(z) £0. 


In that case f(z) = (z — zo)* yr 9 Aken (Z — 20)", where ax # 0 and 
lz — zo| < r. We define g(z) as 


ee) 


g(z) =) dk+n(z— zo)" where |z — zo] <r 
n=0 


and note that g(zo) = ax # 0. Convergence of the series on the RHS implies 
that g(z) is continuous at zo. Consequently, for each € > 0, there exists 6 
such that |g(z) — ax| < € whenever |z — zo| < 6. If we choose € = |ax|/2, 
then, for some 59 > 0, |g(z) — ax| < |ax|/2 whenever |z — zo| < 59. Thus, 
as long as z is inside the circle |z — zg| < dg, g(z) cannot vanish (because if 
it did the first inequality would imply that |ax| < |ax|/2). We therefore have 
the following result. 


zero of order k 
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Theorem 10.6.15 Let f :C — C be analytic at zo and f (zo) = 0. Then 
there exists a neighborhood of zo throughout which f has no other zeros 
unless f is identically zero there. Thus, the zeros of an analytic function are 
isolated. 


When k = 1, we say that zo is a simple zero of f. To find the order of 
the zero of a function at a point, we differentiate the function, evaluate the 
derivative at that point, and continue the process until we obtain a nonzero 
value for the derivative. 


Example 10.6.16 Here are some functions with their zeros: 
(a) The zeros of cos z, which are z = (2k + 1)z/2, are all simple, because 


1 ese = -sin| 2k + v5 20, 
dz z=(2k+1)x/2 2 


(b) To find the order of the zero of f(z) =e* —1l-—z—- z2/2 at z= 0, we 
differentiate f(z) and evaluate f’(0): 


fOr 12 240. 


Differentiating again gives f’(0) = (e* — 1),<o = 0. Differentiating 
once more yields f’” (0) = (e*)z~0 = 1. Thus, the zero is of order 3. 


10.7 Problems 


10.1 Show that the function w = 1/z maps the straight line y = a in 
the z-plane onto a circle in the w-plane with radius 1/(2|a|) and center 


(0, 1/(2a)). 


10.2 Treating x and y as functions of z and z*, 


(a) use the chain rule to find df/dz* and df/dz in terms of partial deriva- 
tives with respect to x and y. 

(b) Evaluate 0f/dz* and df/dz assuming that the Cauchy-Riemann con- 
ditions hold. 


10.3 Show that when z is represented by polar coordinates, the derivative 
of a function f(z) can be written as 


d ‘ 
df _ (aU, av) 
dz or or 


where U and V are the real and imaginary parts of f(z) written in polar 
coordinates. What are the C-R conditions in polar coordinates? Hint: Start 
with the C-R conditions in Cartesian coordinates and apply the chain rule to 


them using r = /x2 + y? and 6 = tan7!(y/x) =cos7!(x/ Vx? + y?). 


10.7 


10.4 


Problems 


Show that +. (Inz) -_ i Hint: Find u(x, y) and v(x, y) for Inz and 


differentiate them. 


10.5 


10.6 


(a) 
(b) 


10.7 


Show that sin z and cos z have only real roots. 


Show that 


the sum and the product of two entire functions are entire, and 
the ratio of two entire functions is analytic everywhere except at the 
zeros of the denominator. 


Given that u = 2A In[(x? + y*)!/?], show that v = 2atan7!(y/x), 


where u and v are the real and imaginary parts of an analytic function w(z). 


10.8 


If w(z) is any complex potential, show that its (complex) derivative 


gives the components of the electric field. 


10.9 
(a) 


(b) 


(c) 


(d) 


Show that 


the flux through an element of area da of the lateral surface of a cylin- 
der (with arbitrary cross section) is d@ = dz(|E|ds) where ds is an 
arc length along the equipotential surface. 

Prove that |E| = |dw/dz| = dv/ds where v is the imaginary part of 
the complex potential, and s is the parameter describing the length 
along the equipotential curves. 

Combine (a) and (b) to get 


flux per unit z-length = a v(P2) — v(P1) 
ie 21 


for any two points P; and P> on the cross-sectional curve of the lat- 
eral surface. Conclude that the total flux per unit z-length through a 
cylinder (with arbitrary cross section) is [v], the total change in v as 
one goes around the curve. 

Using Gauss’s law, show that the capacitance per unit length for the 
capacitor consisting of the two conductors with potentials uw, and uy 
is 

__ charge per unit length [v]/4a 


potential difference ~— |uw2 — uj| 


10.10 Using Eq. (10.7) 


(a) 


(b) 


find the equipotential curves (curves of constant wu) and curves of con- 
stant v for two line charges of equal magnitude and opposite signs 
located at y =a and y = —a in the xy-plane. 

Show that 


Z= a(sin 5 + isinh se )A (cost x — cos =) 


by solving Eq. (10.7) for z and simplifying. 
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(c) Show that the equipotential curves are circles in the xy-plane of 
radii a/sinh(u/2A) with centers at (0,acoth(u/2A)), and that the 
curves of constant v are circles of radii a/sin(v/2A) with centers at 
(acot(v/2A), 0). 


10.11 In this problem, you will find the capacitance per unit length of two 
cylindrical conductors of radii R; and R2 the distance between whose cen- 
ters is D by looking for two line charge densities +4 and —A such that the 
two cylinders are two of the equipotential surfaces. From Problem 10.10, 
we have 

a 


= , =acoth(u;/2), i=1,2, 
sinh(u; /2A) HS GCs 


i 
where y; and yo are the locations of the centers of the two conductors on 
the y-axis (which we assume to connect the two centers). 

(a) Show that D = |y; — y2| =|Ri cosh $+ — Ro cosh 3}. 
(b) Square both sides and use cosh(a — b) = coshacoshb — sinha sinhb 
and the expressions for the R’s and the y’s given above to obtain 


_ R24 R2 — p?2 
cosh ud ea = in : 
2n 2R,R2 


(c) Now find the capacitance per unit length. Consider the special case of 
two concentric cylinders. 

(d) Find the capacitance per unit length of a cylinder and a plane, by let- 
ting one of the radii, say R 1, go to infinity while h = R; — D remains 
fixed. 


10.12 Use Equations (10.4) and (10.5) to establish the following identities. 


(a) Re(sinz)=sinxcoshy, (b) Im(sinz) =cosx sinhy, 

(c) Re(cosz) =cosxcoshy, (d) Im(cosz) = —sinx sinh y, 
(e) Re(sinhz) =sinhx cosy, (f) Im(sinhz) =coshx siny, 
(g) Re(coshz) =coshx cos y, (h) Im(coshz) = sinhx sin y, 
(i) | sinz|* = sin? x + sinh? y, (j) | cosz|? =cos* x + sinh? y, 
(k) | sinhz|? = sinh? x + sin’ y, (1) |coshz|* = sinh* x + cos? y. 


10.13 Find all the zeros of sinh z and cosh z. 


10.14 Verify the following hyperbolic identities. 
(a) cosh’ z — sinh* z= 1. 
(b) cosh(z, + z2) = cosh z, cosh z2 + sinh z, sinh zp. 
(c)  sinh(z; + z2) = sinz; cosh z2 + cosh z, sinh z2. 
(d) cosh2z= cosh? z + sinh? z, sinh 2z = 2 sinh z cosh z. 


tanh z; + tanh z2 
1 + tanh z; tanhz 


(e) tanh(zj + z2) = 
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10.15 Show that 


(a) tann( 5) __ sinhx +isiny 


2 coshx + cosy’ 


(b) ih Zz sinhx —isiny 
co = 
2 


coshx — cos y” 
10.16 Find all values of z such that 

(a) e =-3, (b+) e=1+iVv3, (©) e* 1H=1. 
10.17 Show that |e~*| < 1 if and only if Re(z) > 0. 
10.18 Show that each of the following functions—call each one u(x, y)— 


is harmonic, and find the function’s harmonic partner, v(x, y), such that 
u(x, y) +iv(x, y) is analytic. 


(a) x39 3xy": (b) e*cosy; 
x 
3 Where x” + y" £0; 
(c) ay where x° + y° # 
(d) e7” cos 2x: (e) e —* cos 2xy; 


(f) e*(xcosy— ysiny) +2sinhysinx +x> —3xy?+4+ y. 
10.19 Prove the following identities. 
(a) cos! z = —iIn(z + Vz? — 1), 
(b) sinw'z=—ilnfizt+ V1 - 2], 
_ 1 i—zZ 

(c) tan '2= gin( ="), 

(a) cosh"!z=In(e + Vz? — 1), 

(e) sinh! z= In(z + V2+1), 

(f) tannte= 5in( 72). 


l-z 
10.20 Find the curve defined by each of the following equations. 


(a) z=1l-it, O<t<2, 


(b) 2= phir. —-0 <t<O, 
a 4 3 
(c) z=a(cost+isint), ~—<t<—, 
2 2: 
(d) z=tte, —oo <t <0. 


10.21 Prove part (a) of Proposition 10.3.1. Hint: A small displacement 
along y; can be written as €; Ax; + é, Ay; for i = 1,2. Find a unit vector 
along each displacement and take the dot product of the two. Do the same 
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for y/; use the C-R condition to show that the two are equal. Prove part (b) 


by showing that if f(z) =z’ =x’ + iy’ is analytic and ve + re = 0, then 


10.22 Let f(t) =u(t) + iv(t) be a (piecewise) continuous complex-valued 
function of a real variable t defined in the interval a < t < b. Show that if 
F(t) = U(t) +iV (ft) is a function such that dF /dt = f(t), then 


b 
i f@)dt = F(b) — F(a). 
a 
This is the fundamental theorem of calculus for complex variables. 


10.23 Find the value of the integral di cl + 2)/z]dz, where C is 
(a) the semicircle z= Qe? for0<O<z, 

(b) the semicircle z = 2e!?, form <0 < 2m, and 

(c) the circle z= Qe! for —1 <0 <7. 

10.24 Evaluate the integral . dz/(z — 1—i) where y is 

(a) the line joining z; = 2i and z2 = 3, and 


(b) _ the broken path from z, to the origin and from there to z2. 


10.25 Evaluate the integral ce" (z")"dz, where m and n are integers and 
C is the circle |z| = | taken counterclockwise. 


10.26 Let C be the boundary of the square with vertices at the points z = 0, 
z=1,z=1+i, and z =i with counterclockwise direction. Evaluate 


§ (5z+2)dz and § ede, 
C Cc 


10.27 Use the definition of an integral as the limit of a sum and the fact that 
absolute value of a sum is less than or equal to the sum of absolute values to 
prove the Darboux inequality. 


10.28 Let C; be a simple closed contour. Deform C; into a new contour 


C2 in such a way that C; does not encounter any singularity of an analytic 
function f in the process. Show that 


$ feod=$ f(z) dz. 
Ci C2 


That is, the contour can always be deformed into simpler shapes (such as a 
circle) and the integral evaluated. 


10.29 Use the result of the previous problem to show that 


= 
N 


d 
§ —S— = 20 and $ @-1-i"""dz=0 form = 
cz-l1l-i Cc 


10.7 Problems 


when C is the boundary of a square with vertices at z= 0, z= 2,z=2+2i, 
and z = 2i, taken counterclockwise. 


10.30 Use Eq. (10.12) and the binomial expansion to show that £ (2m j= 
1 


mz", 


10.31 Evaluate $c dz/(z* —1) where C is the circle |z| = 3 integrated in the 
positive sense. Hint: Deform C into a contour C’ that avoids the singularities 
of the integrand. Then use Cauchy Goursat theorem. 


10.32 Show that when f is analytic within and on a simple closed contour 
C and zo is not on C, then 


fon f(2dz 
Cc 


C 2-2 (z= z0)?° 


10.33 Let C be the boundary of a square whose sides lie along the lines 
x = +3 and y = +3. For the positive sense of integration, evaluate each of 
the following integrals. 


© 5% o fits 
© $ perm @ poe 
©) § as. ( § res 
® $ ape Oe Sera 
o (ae vie 
© $ apt 0 $ emp 


tan zdz zdz 
(m) i for —3 <a <3, (n) § op: 


10.34 Let C be the circle |z — i| = 3 integrated in the positive sense. Find 
the value of each of the following integrals. 


(a) § e&é d (b) § sinh z d 
ce+n? e c (22 +2?) 
dz dz 
(©) bas @) f (z2 +. 9)2’ 


cosh z 24 —3z+4 
> IZ, —>———. dz 
2 eee © cw — 4243 
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10.35 Show that Legendre polynomials (for |x| < 1) can be represented as 


(=)" d= 


d ’ 
Oni) Jo @—xytt * 


Py (x) = 
where C is the unit circle around the origin. 


10.36 Let f be analytic within and on the circle yo given by |z — zo| = ro 
and integrated in the positive sense. Show that Cauchy’s inequality holds: 


niM 
|f™ (o)| < —-. 
"0 
where M is the maximum value of | f(z)| on yo. 


10.37 Expand sinhz in a Taylor series about the point z = iz. 


10.38 What is the largest circle within which the Maclaurin series for tanh z 
converges to tanhz? 


10.39 Find the (unique) Laurent expansion of each of the following func- 
tions about the origin for its entire region of analyticity. 


1 1 
a)! Sa b) zcos(z”); c) ———; 
Wi eGe ase he RONe Se aga7 
sinh z — z 1 1 
d Sra Se dee 8g 
() Or aaa Osa 
2 
7-4 1 Zz 
; h) ——-S:; i 
C=. W) grage OF a 
10.40 Show that the following functions are entire. 
eX-1 _ 2 
—<£ forz 40, 
@) f@=)." 
i for z=0. 
sinz for z £0, 
b zZ=y % 
Pr TED f for z= 0. 
SSE for z Am /2, 
©) f@=)7 7h ee 
—l/x for z= 77/2. 
10.41 Let f be analytic at zo and f(zo) = f’(zo) =::- = f (zo) = 0. 
Show that the following function is analytic at zo: 
@) = at for Zi FH Z05 
8 f&ETY (Zo) irae 20: 


(k+D! 


10.7 Problems 


10.42 Obtain the first few nonzero terms of the Laurent series expansion of 
each of the following functions about the origin. Also find the integral of the 
function along a small simple closed contour encircling the origin. 


1 1 Zz 
ae b) ———_; 
(a) sin Z (b) 1—cosz (©) 1 —coshz 
2 4 

Zz Zz 1 
d 3 ae Games 
(d) z—sinz (e) 6z+ z3 — 6sinhz © 22 sinz 

1 
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One of the most powerful tools made available by complex analysis is the 
theory of residues, which makes possible the routine evaluation of certain 
definite integrals that are impossible to calculate otherwise. The derivation, 
application, and analysis of this tool constitute the main focus of this chap- 
ter. In the preceding chapter we saw examples in which integrals were re- 
lated to expansion coefficients of Laurent series. Here we will develop a 
systematic way of evaluating both real and complex integrals. 


11.1. Residues 


Recall that a singular point zo of f :C — C is a point at which f fails to 
be analytic. If in addition, there is some neighborhood of zo in which f is 
analytic at every point (except of course at zo itself), then zo is called an 
isolated singularity of f. Almost all the singularities we have encountered 
so far have been isolated singularities. However, we will see later—when 
discussing multivalued functions—that singularities that are not isolated do 
exist. 

Let zo be an isolated singularity of f. Then there exists an r > 0 such 
that within the “annular” region 0 < |z — zo| <r, the function f has the 
Laurent expansion 


— n ~ n by b2 
F@) = 2 ante 20) = Gem) fy =a ale cae de aca 
where 
_1l f fed = ee 
On Oni ) (E — "tl ant, Pe P rere zo)" dé. 

In particular, 

1 
b= 5 fede, (11. 

Tl JC 


where C is any simple closed contour around zg, traversed in the positive 
sense, on and interior to which f is analytic except at the point zo itself. 
The complex number by, which is essentially the integral of f(z) along the 
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contour, is called the residue of f at the isolated singular point Zo. It is 
important to note that the residue is independent of the contour C as long as 
zo is the only isolated singular point within C. 


Historical Notes 

Pierre Alphonse Laurent (1813-1854) graduated from the Ecole Polytechnique near 
the top of his class and became a second lieutenant in the engineering corps. On his 
return from the war in Algeria, he took part in the effort to improve the port at Le Havre, 
spending six years there directing various parts of the project. Laurent’s superior officers 
admired the breadth of his practical experience and the good judgment it afforded the 
young engineer. During this period he wrote his first scientific paper, on the calculus of 
variations, and submitted it to the French Academy of Sciences for the grand prix in 
mathematics. Unfortunately the competition had already closed (although the judges had 
not yet declared a winner), and Laurent’s submission was not successful. However, the 
paper so impressed Cauchy that he recommended its publication, also without success. 
The paper for which Laurent is most well known suffered a similar fate. In it he described 
a more general form of a theorem earlier proven by Cauchy for the power series expan- 
sion of a function. Laurent realized that one could generalize this result to hold in any 
annular region between two singular or discontinuous points by using both positive and 
negative powers in the series, thus allowing treatment of regions beyond the first singular 
or discontinuous point. Again, Cauchy argued for the paper’s publication without success. 
The passage of time provided a more just reward, however, and the use of Laurent series 
became a fundamental tool in complex analysis. 

Laurent later worked in the theory of light waves and contended with Cauchy over the in- 
terpretation of the differential equations the latter had formulated to explain the behavior 
of light. Little came of his work in this area, however, and Laurent died at the age of forty- 
two, a captain serving on the committee on fortifications in Paris. His widow pressed to 
have two more of his papers read to the Academy, only one of which was published. 


We use the notation Res[ f (zo)] to denote the residue of f at the isolated 
singular point z9. Equation (11.1) can then be written as 


$ f(@Qdz=2zi Res| f (zo). 
Cc 


What if there are several isolated singular points within the simple closed 
contour C? The following theorem provides the answer. 


Theorem 11.1.1 (The residue theorem) Let C be a positively oriented 
simple closed contour within and on which a function f is analytic 
except at a finite number of isolated singular points z,,Z2,.--,Zm 
interior to C. Then 


p f(2)dz=2ni )Res| f (zx)]. (2) 
c k=1 


Proof Let Cx be the positively traversed circle around zx. Then Fig. 11.1 
and the Cauchy-Goursat theorem yield 


0= fda=—g f(@dzt fads+ h fade 
Cc’ circles lines Cc 


11.1. Residues 


Fig. 11.1 Singularities are avoided by going around them 


where C’ is the union of all the contours, and the minus sign on the first 
integral is due to the fact that the interiors of all circles lie to our right as we 
traverse their boundaries. The two equal an opposite contributions of each 
line cancel out, and we obtain 


$ f@dz= ¢ f(z)dz= ) 2ni Res| f (zx)], 
° kai" Ck k=1 


where in the last step the definition of residue at zx, has been used. 


Example 11.1.2 Let us evaluate the integral fc (2z —3) dz/[z(z— 1)] where 
C is the circle |z| = 2. There are two isolated singularities in C, z} = 0 and 
Z2 = 1. To find Res[ f(z1)], we expand around the origin: 


2z-3 3 1 3 1 3 
= =-+ =-4+1+4+z+--- for|z| <1. 
az-l z z-1l z 1-z Zz 


This gives Res[ f(z1)] = 3. Similarly, expanding around z = | gives 


2z—3 = 3 1 
feSly Sala ea 


1 = n n 
Sea te DGD 


k=0 


which yields Res[ f (z2)] = —1. Thus, 


$ =a 2mi{Res| f (z1)] + Res| f(z2)]} = 27i(3 — 1) = 4z7i. 
c 2(z—1) 
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11. Calculus of Residues 
11.2 Classification of Isolated Singularities 


Let f :C > C have an isolated singularity at zo. Then there exist a real 
number 7 > 0 and an annular region 0 < |z — zo| <r such that f can be 
represented by the Laurent series 


ioe Yates + (11.3) 


_ n° 
n=0 ( 7 


The second sum in Eq. (11.3), involving negative powers of (z — Zo), is 
called the principal part of f at zo. We can use the principal part to distin- 
guish three types of isolated singularities. The behavior of the function near 
the isolated singularity is fundamentally different in each case. 


1. If b, =0 for all n > 1, zg is called a removable singular point of /. 
In this case, the Laurent series contains only nonnegative powers of 
(z — zo), and setting f (zo) = ao makes the function analytic at zo. For 
example, the function f(z) = (e* — 1 — z)/ z2, which is indeterminate 
at z = 0, becomes entire if we set f(0) = 5. because its Laurent series 
f@M= 5 “ae z +--+ has no negative power. 

2. If b, =0 for all n > m and b,, 4 0, zo is called a pole of order m. In 
this case, the expansion takes the form 


bm 
(z — zo)” 


f@= > Gn (z — zo)" + 


bj 
n=0 ~ 20 


for 0 < |z — zo| <r. In particular, if m = 1, zo is called a simple pole. 

3. Ifthe principal part of f at zo has an infinite number of nonzero terms, 
the point zg is called an essential singularity. A prototype of functions 
that have essential singularities is 


\ wifi 

which has an essential singularity at z = 0 and a residue of | there. To 
see how strange such functions are, we let a be any real number, and 
consider z = 1/(Ina + 2nzi) forn =0,+1,+2,.... For such a z we 
have e!/% = elna+2nmi _ genni — q_ Tn particular, as n — 00, z gets ar- 
bitrarily close to the origin. Thus, in an arbitrarily small neighborhood 
of the origin, there are infinitely many points at which the function 
exp(1/z) takes on an arbitrary value a. In other words, as z > 0, the 
function gets arbitrarily close to any real number! This result holds for 
all functions with essential singularities. 


Example 11.2.1 (Order of poles) 


(a) The function (z* — 3z +5) /(z — 1) has a Laurent series around z = | 
containing only three terms: 
z7—3z+5 _ 3 


=-1 1 : 
cl 7 ee 
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Thus, it has a simple pole at z = 1, with a residue of 3. 
(b) The function sin z/ z° has a Laurent series 


zntl 


Sinz 1 1 1 Z 
1) n 
26 oy (2n + 1)! ~ 25 6z° as (5!)z 7! a 


about z = 0. The principal part has three terms. The pole, at z = 0, is 
of order 5, and the function has a residue of 1/120 at z=0. 

(c) The function (z* — 5z + 6) /(z — 2) has a removable singularity at 
Zz = 2, because 


2—Sz+6  (<—2)(Z—-3) | 
z-200— zd 


and b, = 0 for all n. 


Many functions can be written as the ratio of two polynomials. A func- 


tion of this form is called a rational function. If the degree of the numerator rational function 


is larger than the denominator, the ratio can be written as a polynomial plus 
a rational function the degree of whose numerator is not larger than that 
of the denominator. When we talk about rational functions, we exclude the 
polynomials. So, we assume that the degree of the numerator is less than or 
equal to the degree of the denominator. Such rational functions f have the 
property that as z goes to infinity, f does not go to infinity. Stated equiva- 
lently, f(1/z) does not go to infinity at the origin. 

Let f be a function whose only singularities in the entire complex plane 
are finite poles, i.e., the point at infinity is not a pole of the function. This 
means that f(1/z) does not have a pole at the origin. Let {zj}jat be the 
poles of f such that z; is of order m;. Expand the function about z; in a 
Laurent series 


— bh bm, ce. Pi 
A reer + im tat 2 = Cay + 81@) 


where 
Pi(2) = biz — 21)! + baz — 21)? He + Bm —1Z — 21) + Om 


is a polynomial of degree m, — 1 in z and g; is analytic at z,. It should 
be clear that the remaining poles of f are in g;. So, expand g; about z2 in 
a Laurent series. A similar argument as above yields g;(z) = P2(z)/(z — 
z2)'"2 + go(z) where P2(z) is a polynomial of degree m2 — 1 in z and gp is 
analytic at z; and z2. Continuing in this manner, we get 


Pi(Z) P2(Z) Pa) 
w= a0 TES spe BIZ) 
re a (ce 2 ay a 
where g has no poles. Since all poles of f have been isolated in the sum, 
g must be analytic everywhere in C, i.e., an entire function. Now substitute 
1/t for z, take the limit t — 0, and note that, since the degree of P; is 


343 


344 


11 Calculus of Residues 


m; — 1, all the terms in the preceding equation go to zero except possibly 
g(1/t). Moreover, 


lim g(1/t) #00, 
t>0 


because, by assumption, the point at infinity is not a pole of f. Thus, g 
is a bounded entire function. By Proposition 10.5.5, g must be a constant. 
Taking a common denominator for all the terms yields a ratio of two poly- 
nomials. We have proved the following: 


Proposition 11.2.2 A function whose only singularities are poles in 
a finite region of the complex plane is a rational function. 


The isolated singularity that is most important in applications is a pole. 
For a function that has a pole of order m at zo, the calculation of residues 
is routine. Such a calculation, in turn, enables us to evaluate many integrals 
effortlessly. How do we calculate the residue of a function f having a pole 
of order m at zo? 

It is clear that if f has a pole of order m, then g : C > C defined by 
g(z) = (z — zo)” f(z) is analytic at zo. Thus, for any simple closed contour 
C that contains zo but no other singular point of f, we have 


f g(zidz gz) 
1 


c(z—z0)" (m1)! ° 


1 1 
Res f(co)] = 5 p fode= 5 


In terms of f this yields! 


Theorem 11.2.3 Jf f(z) has a pole of order m at zo, then 
ma 


lim [ 
(m — 1)! z>20 dzm—! 


Res| f (zo)] = G— coy ft) (ila) 


For the special, but important, case of a simple pole, we obtain 


Res| f (zo)] = lim [(z — zo) F@)]. (11.5) 


11.3 Evaluation of Definite Integrals 


The most widespread application of residues occurs in the evaluation of real 
definite integrals. It is possible to “complexify” certain real definite integrals 
and relate them to contour integrations in the complex plane. We will discuss 
this method shortly; however, we first need a lemma. 


'The limit is taken because in many cases the mere substitution of zo may result in an 
indeterminate form. 
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Lemma 11.3.1 (Jordan’s lemma) Let Cr be a semicircle of radius R in the Jordan's lemma 
upper half of the complex plane (UHP) and centered at the origin. Let f be 

a function that tends uniformly to zero faster than 1/|z| for arg(z) € [0, 7] 

as |z| > oo. Let a be a nonnegative real number. Then 


lim Ip = lim e'% f(z)dz =0. 


R>c R>oo Cr 
Proof For z € Cr we write z = Re’, dz =iRe'’d6, and 
iaz=ia(Rcosé+iRksind) =iaRcosé —aR sing 


and substitute in the absolute value of the integral to show that 
uw . . 
Trl 2) e #Rsin8 RI F(Re'®) dd. 
0 


By assumption, R| f (Re!®)| < €(R) independent of 0, where €(R) is an 
arbitrary positive number that tends to zero as R > oo. By breaking up the 
interval of integration into two equal pieces and changing 6 to 2 — @ in the 
second integral, one can show that 


n/2 ; 
\Ir| < 2<(R) | ge onsin’ ia 
0 


Furthermore, by plotting sin@ and 26/2 on the same graph, one can easily 
see that sin@ > 20/m forO0 <0 < 2/2. Thus, 


(1- oe), 


m/2 R 
Tp < 2e(R) | oe 2aR/2)6 gg — nm 
0 


a 


which goes to zero as R gets larger and larger. 


Note that Jordan’s lemma applies for @ = 0 as well, because (1 — 
e*®) _, wR as a > 0. If a < 0, the lemma is still valid if the semicir- 
cle Cp is taken in the lower half of the complex plane (LHP) and f(z) goes 
to zero uniformly for 2 < arg(z) < 27. 

We are now in a position to apply the residue theorem to the evaluation of 
definite integrals. The three types of integrals most commonly encountered 
are discussed separately below. In all cases we assume that Jordan’s lemma 
holds. 


11.3.1 Integrals of Rational Functions 


The first type of integral we can evaluate using the residue theorem is of the 


form 
lee) 
n= p(x) ae. 
~0o F(x) 
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where p(x) and q(x) are real polynomials, and q(x) 4 0 for any real x. We 


can then write 
R 
fis tim | ac ae tim | Pe), 
R>0 J_R q(x) R>0 Jo, q(Z) 


where C,. is the (open) contour lying on the real axis from — RF to +R. Since 
Jordan’s lemma holds by assumption, we can close that contour by adding 
to it the semicircle of radius R [see Fig. 11.2(a)]. This will not affect the 
value of the integral, because in the limit R — oo, the contribution of the 
integral of the semicircle tends to zero. We close the contour in the UHP if 
q(z) has at least one zero there. We then get 


Ee jim g PR) 4 og a [2], 


c q(z) (z;) 


where C is the closed contour composed of the interval (—R, R) and the 
semicircle Cr, and {Z; Vins are the zeros of g(z) in the UHP. We may instead 


close the contour in the LHP? in which case 


k 
eee] 
= —2 R 
I, > eo] ; 


where {z ne —, are the zeros of g(z) in the LHP. The minus sign indicates 
that in the LHP we (are forced to) integrate clockwise. 


Example 11.3.2 Let us evaluate the integral J = ioe x dt/(a?+DG?+ 
9)]. Since the integrand is even, we can extend the interval of integration to 
all real numbers (and divide the result by 2). It is shown below that Jordan’s 
lemma holds. Therefore, we write the contour integral corresponding to I: 


7 zdz 
l= , 
2 Ice (22+ (22 +9) 


where C is as shown in Fig. 11.2(a). Note that the contour is traversed in the 
positive sense. This is always true for the UHP. The singularities of the func- 
tion in the UHP are the simple poles i and 3i corresponding to the simple 
zeros of the denominator. The residues at these poles are 


ze = 1 
(c—i\(z+i(e2+9) 161’ 

z _ 3 
(22+ 1)(z—3i)(z+3i) 161 


Res| f (i)] = Tate i) 


Res| f (3i)] = jim 3i) 


2Provided that Jordan’s lemma holds there. 
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(a) (b) 


Fig. 11.2 (a) The large semicircle is chosen in the UHP. (b) Note how the direction of 
contour integration is forced to be clockwise when the semicircle is chosen in the LHP 


Thus, we obtain 


_ ad x2 dx i 2dz 

=) ae ee Cae 
: 1 3 TU 

=ni( atigj)-e 


It is instructive to obtain the same results using the LHP. In this case, 
the contour is as shown in Fig. 11.2(b) and is taken clockwise, so we have 
to introduce a minus sign. The singular points are at z = —i and z = —3i. 
These are simple poles at which the residues of the function are 


ze 1 


(—i(z+inzz2 +9) 16: 
z2 _ 3 
(22+ 1)(z—3i)(z+3i) = 167 


’ 


Res[ f (—i)] = jim @ +i) 


Res| f (—3i)] = lim, @ + 3i) 


Therefore, 


= a dx 7 if z dz 
Jo (2 +1249) 2 Je (22 +1) (2 +9) 
{1 3 a 
— Tl => " 
16: 16i 8 


To show that Jordan’s lemma applies to this integral, we have only to 
establish that limp_, 95 R| f (Re!)| = 0. In the case at hand, a = 0 because 
there is no exponential function in the integrand. Thus, 


R021 R3 
(R26219 + 1)(R2e7/8 + 9) = | R2 2/9 + 1| | R2 2/9 + 9| : 


R| f (Re’®)| = R| 
which clearly goes to zero as R > oo. 


Example 11.3.3 Let us now consider a slightly more complicated integral: 


[. x2 dx 
oo (x? + 1) (x? +4)?’ 
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which turns into fc 22 dz / [(z* +1)(z? +4)?] as a contour integral. The poles 
in the UHP are at z = i and z = 2/. The former is a simple pole, and the latter 
is a pole of order 2. Thus, using Eqs. (11.5) and (11.4), we obtain 


2 _ 1 
(z—i)(z+ij(z2+4)2 18% 


’ 


Res[ f (i)] = lim(z — i) 


ee. _ 4 +2 ‘S 
Re fC)|= Gay BS ae le 2+ DG+ Ue id 


_ z 5 
lim : == 
232i dz| (z2 + 1)(z + 2i)2 72i 


and 


‘- x72 dx ; 1 5 1 
=27i -+—])=—. 
oo (X2 + 1) (x? + 4)? 18i 9 72i 36 


Closing the contour in the LHP would yield the same result. 


11.3.2 Products of Rational and Trigonometric Functions 


The second type of integral we can evaluate using the residue theorem is of 
the form 


OF pix) ° p(x). 
cosaxdx or ia 


—00 FX —00 YX 


where a is areal number, p(x) and q(x) are real polynomials in x, and q(x) 
has no real zeros. These integrals are the real and imaginary parts of 


= a p(x) jax 
=f q(x) dx. 


The presence of e’“* dictates the choice of the half-plane: If a > 0, we 
choose the UHP; otherwise, we choose the LHP. We must, of course, have 
enough powers of x in the denominator to render R| p(Re!’)/q(Re!®)| uni- 
formly convergent to zero. 


Example 11.3.4 Let us evaluate ie [cosax/(x* + 1)*]dx where a £0. 
This integral is the real part of the integral J, = 1 baa el dx /(x* + 1)”. 
When a > 0, we close in the UHP as advised by Jordan’s lemma. Then we 
proceed as for integrals of rational functions. Thus, we have 


eifZ 
h= ——_~ dz = 2m Res] f (i fora >0 
; ) a [Fa] 


because there is only one pole (of order 2) in the UHP at z =i. We next 
calculate the residue: 


i df ete 
Res[ 05] = Hin | ss soe 
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es a | Gr iiael* — 20! 
= lim —} —— | =lim : 
zi dz| (z+i)2 zi (z+i)? 


e 4 


rr (1+a). 


Substituting this in the expression for J), we obtain J) = mou +a) for 
a>0O. 


When a < 0, we have to close the contour in the LHP, where the pole of 
order 2 is at z = —i and the contour is taken clockwise. Thus, we get 


eid 
h= ——~ dz = — 27 Res] f(-i fora <0. 
, °) sp [ra] 


For the residue we obtain 


iaz a 
Res[f(—1)]= tim, [ce +i7 e __é 


(ie +i? rl 


and the expression for Jy becomes J) = 7" (1 — a) for a < 0. We can com- 
bine the two results and write 


© cosax 8 
———_ dx = Re(Ih) = h = —(1 = el, 
i Gai en) = Ih 5 + lal)e 


Example 11.3.5 As another example, let us evaluate 


Sere ah 
a Eee dx wherea #0. 
oo x4 +4 


This is the imaginary part of the integral J, = pies xe! dx f(x* + 4), 


which, in terms of z and for the closed contour in the UHP (when a > 0), 
becomes 


zeit me 
nah ar gdem tat Rel see] fora > 0. (11.6) 


The singularities are determined by the zeros of the denominator: z+ +4 = 0, 
orz=1+i,-1 


i. Of these four simple poles only two, 1+ 7 and —1+i, 
are in the UHP. We now calculate the residues: 


Res| f (1 + i)| 


iaz 
2 lim-g@—to% & 


zolti (z-—1-—i)(z-1+i)(z<+1-H4+1+4+1) 


al + i)el@Ut) elf@e-4 


ONO) SE? 
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Res[ f(-1 + i)| 


zela 
= lim («+1-i) : ; - : 
geil (zt1l—-)z@+14+OD@-1-i1(z-1+i) 
(-1 + i)el@-Ht) e 4e-4 
~ i= 2 a 8i 


Substituting in Eq. (11.6), we obtain 


—@ 

e i amr TT a. 
In = 2ni —(e“ -e '*) =ixe * sina. 

8i 2; 


Thus, 


°° x sinax TU ct 
dx =Im(h) = —e “sina fora>0. (11.7) 
oo x4 4+4 2; 


For a < 0, we could close the contour in the LHP. But there is an easier 
way of getting to the answer. We note that —a > 0, and Eq. (11.7) yields 


°° x sinax °° x sin[(—a)x] 
dx = dx 
9 *44+4 9 «0x4 +4 


WU 4 

a —(-4) gf a gt 3 

=—--<e sin(—a) = —e* sind. 
2 <2 2 


We can collect the two cases in 


© x sinax wT 
dx = —e~""Isina 
’ 
oo x*7+4 2 


11.3.3 Functions of Trigonometric Functions 


The third type of integral we can evaluate using the residue theorem involves 
only trigonometric functions and is of the form 


20 
[ F(sin6, cos@) dé, 
0 


where F is some (typically rational) function of its arguments. Since 6 
varies from 0 to 277, we can consider it an argument of a point z on the 
unit circle centered at the origin. Then z = e!” and e~!® = 1/z, and we can 
substitute cos @ = (z + 1/z)/2, sin@ = (z — 1/z)/(2i), and d@ = dz/(iz) in 


the original integral, to obtain 


pF z—I/z z+1/z\dz 
C % ° 2 iz 


This integral can often be evaluated using the method of residues. 


11.3. Evaluation of Definite Integrals 


Example 11.3.6 Let us evaluate the integral i d0/(1 + acos@) where 
|a| < 1. Substituting for cos @ and dé@ in terms of z, we obtain 


$ dz/iz == dz 
cltal(z2+D/(22]1 i Jo 2z+az*+a’ 


where C is the unit circle centered at the origin. The singularities of the 
integrand are the zeros of its denominator: 


JHlevb=w -l-vV1l-@ 
- . 


Zl and z= 


a 


For |a| < 1 it is clear that z2 will lie outside the unit circle C; therefore, it 
does not contribute to the integral. But z; lies inside, and we obtain 


dz 
——_—___ = 927i R ‘ 
$a i Res[ f(z1)] 


The residue of the simple pole at z; can be calculated: 


; 1 1 1 
Res[ f(z1)] = lim @ — z1) ( ) 


aZz—z)(z—-z2) a\z—z 


1 a 1 
7 sz —) GP ge 


It follows that 


ie dé -$ dz 2, ( 1 ) Qn 
= = JTL => ss 
90 Il+acos6 iJc2z+az2+a i 2/1 —-a? V1 —a? 


Example 11.3.7 As another example, let us consider the integral 


a dé 
l= ———{ wherea>l. 
9 (a+cos6)? 


Since cos @ is an even function of 6, we may write 


1 [7 dé 
I= where a > 1. 
2 J_« (a +cos@)? 


This integration is over a complete cycle around the origin, and we can make 
the usual substitution: 


sp dz/iz =f zdz 
= == 
2Jclat(z?+1)/2z? i Jc (2? +2az+ 1)? 


The denominator has the roots z} = —a+~J/a2 — 1 and z2 = —a— Va? — 1, 
which are both of order 2. The second root is outside the unit circle because 
a > 1. Also, it is easily verified that for all a > 1, z; is inside the unit circle. 
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Since z; is a pole of order 2, we have 


tee, a] z 
Res[ f(z1)] = Jim =|¢ z1) I 


; Zz 1 221 
= lim = 
z>21 dz| (z— 22) (z1— 22)? (z1 — Z2)3 
7 a 
~ A(a? — 13/2" 


We thus obtain J = 220i Res[ f (z1)] = Gop: 


11.3.4 Some Other Integrals 


The three types of definite integrals discussed above do not exhaust all pos- 
sible applications of the residue theorem. There are other integrals that do 
not fit into any of the foregoing three categories but are still manageable. 
As the next two examples demonstrate, a clever choice of contours allows 
evaluation of other types of integrals. 


Example 11.3.8 Let us evaluate the Gaussian integral 
cee § 2 
I =f eb dy wherea,beR, b>0. 
—0o 
Completing squares in the exponent, we have 


eC -blx—ia/(2b)2—a2/4b 2/4b = b[x—ia/(2b)/? 
I -/ gh eee egg oF” Tim ge be 12/ OPT dx, 
—oo R>o J/_R 


If we change the variable of integration to z = x — ia/(2b), we obtain 


R—ia/(2b) 
T=e7?/4) jim e dz. 
Roo J_— R—ija/(2b) 


Let us now define Ip: 
R—ia/(2b) 
Ir =/ edz, 
—R-ia/(2b) 


This is an integral along a straight line C, that is parallel to the x-axis (see 
: 2. : 
Fig. 11.3). We close the contour as shown and note that e~?* is analytic 
throughout the interior of the closed contour (it is an entire function!). Thus, 


the contour integral must vanish by the Cauchy-Goursat theorem. So we 


obtain 
b22 "he bz 
e+ [ Pde | ax [ e "= dz=0. 
C3 R C4 
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—ia K2b) 


Fig. 11.3 The contour for the evaluation of the Gaussian integral 


Along C3, z= R+iy and 


2 0 b +a) 2 bR2 0 2 ib 
| oe dz =} eo (R+iy) idy =je- R / ey —2i Rydy 
C3 —ia/(2b) —ia/(2b) 


which clearly tends to zero as R — oo. We get a similar result for the inte- 
gral along C4. Therefore, we have 


R b 2 ; a b 2 ua 
n= | e dx => lim ie= | e dx =, /—. 
_—R R>0oo —~oo b 


Finally, we get 
[. giax—bx? 7, = 7 —a?/(4b) 
= Vb : 


Example 11.3.9 Let us evaluate ] = ie dx /(x? + 1). If the integrand were 
even, we could extend the lower limit of integration to —oo and close the 
contour in the UHP. Since this is not the case, we need to use a different 
trick. To get a hint as to how to close the contour, we study the singularities 
of the integrand. These are simply the roots of the denominator: z> = —1 or 
Zp = ef 2n+D7/3 with n = 0, 1,2. These, as well as a contour that has only 
Zo as an interior point, are shown in Fig. 11.4. We thus have 


dz dz 
r+ f +f —+—— = 277i Res : 11.8 
Oe eg Pee Lf (zo)] (11.8) 


The Cpr integral vanishes, as usual. Along C2, z = re'”, with constant a, so 
that dz = e'“dr and 


/ dz [ erur 8 ie dr 
= ry — e a 
C o+1 ee (rei@)3 +1 0 r3e3ia 4 J 


In particular, if we choose 3a = 277, we obtain 


| mane / * ye 
C2 Z o # 
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Cr 


Fig. 11.4 The contour is chosen so that only one of the poles lies inside 


Substituting this in Eq. (11.8) gives 


20 
(i- game yy = 2niRes| f(zo)| = 1= [iB Res| f (zo) |. 
On the other hand, 
Res[ f (zo) ] = lim (z — zo) 
es| f(zo)| = lim (z — z 
° Z—>20 ® (z — zo)(z — z1)(z — 22) 
1 1 


~ @o—z1)Go—Z2) (e#/3 — eit) (eit/3 — ei5/3)" 
These last two equations yield 


pe: Qi 1 = 2m 
= 1 — e273 (eit/3 = ei) (eit /3 = ei57/3) _ 33° 


11.3.5 Principal Value of an Integral 


So far we have discussed only integrals of functions that have no singulari- 
ties on the contour. Let us now investigate the consequences of the presence 
of singular points on the contour. Consider the integral 


[. Ne. (11.9) 
oo X — XO 


where x is areal number and f is analytic at x9. To avoid xy—which causes 
the integrand to diverge—we bypass it by indenting the contour as shown in 
Fig. 11.5 and denoting the new contour by C,,. The contour Co is simply a 
semicircle of radius €. For the contour C,, we have 


f@) ac= fo f(x) art f- f(x) ee f@) Ae 


Cy & — X0 —coo *— X09 rote * — XO Co < — X0 


principal value of an In the limit « — 0, the sum of the first two terms on the RHS—when it 
integral €Xists—defines the principal value of the integral in Eq. (11.9): 


11.3. Evaluation of Definite Integrals 


Xo -E Xo Xo+eE 


Fig.11.5 The contour C,, avoids xo 


pf LO ax =m] [ a art fo a) ax}. 
—~co X¥ — XO €>0L Jo *—X0 xote * — XO 


The integral over the semicircle is calculated by noting that z — x9 = ee!” 
and dz = ice!" dé: Jey F(z) dz/(z — xo) = —imf (xo). Therefore, 
[o,@) 
SO a= P | el er (11.10) 
Cu © — X0 —oo ¥ — XO 


On the other hand, if Co is taken below the singularity on a contour Cg, say, 
we obtain 


f@) d= Pf LOY edith tae. 
09 X — XO 


Ca & — X0 


We see that the contour integral depends on how the singular point xo is 
avoided. However, the principal value, if it exists, is unique. To calculate 
this principal value we close the contour by adding a large semicircle to it 
as before, assuming that the contribution from this semicircle goes to zero 
by Jordan’s lemma. The contours C,, and Cg are replaced by a closed con- 
tour, and the value of the integral will be given by the residue theorem. We 
therefore have 


fj) 


<j 7 


7). 


ang = XG 


P 


= Hiaftco) +28 Rel SE |. (11.11) 


where {z DR , are the poles of f(z), the plus sign corresponds to placing 
the infinitesimal semicircle in the UHP, as shown in Fig. 11.5, and the minus 
sign corresponds to the other choice. 


Example 11.3.10 Let us use the principal-value method to evaluate the in- 


tegral 
© sinx 1 f® sinx 
T= dx => dx. 
0 x QD pane 


It appears that x = 0 is a singular point of the integrand; in reality, however, 
it is only a removable singularity, as can be verified by the Taylor expansion 
of sinx/x. To make use of the principal-value method, we write 


1 lee) ex 1 lee) ex 
I=-—Im ‘ —dx)=-Im P| —dx ). 
2 oo X 2 -—co * 
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Xo x 


Fig. 11.6 The equivalent contour obtained by “stretching” C,,, the contour of Fig. 11.5 


We now use Eq. (11.11) with the small circle in the UHP, noting that there 
are no singularities for e'* /x there. This yields 


lee) ex 
2 — dx =ine =in. 


Spe. & 


Therefore, 


© sinx 1 . 0 
— dx = =Im(iz) =~. 
0 xX 2 2 


The principal value of an integral can be written more compactly if we 
deform the contour C,, by stretching it into that shown in Fig. 11.6. For small 
enough e€, such a deformation will not change the number of singularities 
within the infinite closed contour. Thus, the LHS of Eq. (11.10) will have 
limits of integration —oo + ie and +00 + ie. If we change the variable of 
integration to € = z — ie, this integral becomes 


[. fEriE) | =i f(E) dé =|" f(z) dz 


; —= —, (11.12) 
oo &€ +i€ — xg cw &€ —xXo+ie oo 2 — x0 tie 


where in the last step we changed the dummy integration variable back to z. 
Note that since f is assumed to be continuous at all points on the contour, 
f( +ie) > f(&) for small €. The last integral of Eq. (11.12) shows that 
there is no singularity on the new x-axis; we have pushed the singularity 
down to xo — ie. In other words, we have given the singularity on the x-axis 
a small negative imaginary part. We can thus rewrite Eq. (11.10) as 
[o,@) [o,@) 
pf 2? arainroo+ | OO. 
—~oo X — x0 9 X — Xo +1E 

where x is used instead of z in the last integral because we are indeed in- 
tegrating along the new x-axis—assuming that no other singularities are 
present in the UHP. A similar argument, this time for the LHP, introduces a 
minus sign for the first term on the RHS and for the € term in the denomi- 
nator. Therefore, 


Proposition 11.3.11 The principal value of an integral with one sim- 
ple pole on the real axis is 
© £@) [. f(x) dx 


P —— dx =+imf (xo) + 


=, (11.13) 
ee = 20 oo X — x9 Hi€ 


where the plus (minus) sign refers to the UHP (LHP). 
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Cr 
Ci C2 
€ E 
x] x2 


Fig. 11.7 One of the four choices of contours for evaluating the principal value of the 
integral when there are two poles on the real axis 


This result is sometimes abbreviated as 


1 1 
—=P + id(x — x0). (11.14) 
X—XQ UIE X — XO 


Example 11.3.12 Let us use residues to evaluate the function 


1 lee) ikx q 
f= 5 | ee Se 


271 Jog X —1€ 


The integral 
We have to close the contour by adding a large semicircle. Whether we do ;epresentation of the 6 


this in the UHP or the LHP is dictated by the sign of k: If k > 0, we close in (step) function 
the UHP. Thus, 


1 ikz q ikz 
fo=5— f° = =Res| a 
2mi Jc Z—1€ ZTE |, sie 


elk 
= lim l« Fe) ae 2% 1, 


zie z—I1e «>0 


On the other hand, if k < 0, we must close in the LHP, in which the integrand 
is analytic. Thus, by the Cauchy-Goursat theorem, the integral vanishes. 
Therefore, we have 


1 ifk>0, 


k= 
a ifk <0. 


This is precisely the definition of the theta function (or step function). Thus, theta (or step) function 
we have obtained an integral representation of that function: 


1 oe) ext 
O(x) = —/ —dt. 
277i J_o t —i€ 


Now suppose that there are two singular points on the real axis, at x; and 
x2. Let us avoid x; and x2 by making little semicircles, as before, letting both 
semicircles be in the UHP (see Fig. 11.7). Without writing the integrands, 
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we can represent the contour integral by 


X1—-€ x2—-€ ee) 
: +f +f +f+f +f = 2ni ) Res. 
—oo Ci xi +e 2 x2+€ Cr 


The principal value of the integral is naturally defined to be the sum of all 
integrals having € in their limits. The contribution from the small semicircle 
C; can be calculated by substituting z — x; = €e!® in the integral: 


/ fedz 9 fa + ee! ice? do _ i LED 
Cc 


(@ax(z— x2) I Ce (x1 Fee — x2) x = x0” 


with a similar result for Cz. Putting everything together, we get 


pf F@) dx —in f(@2) — Fay) =2i ) Res. 


oo (X — X1) (x — x2) x2 —X] 


If we include the case where both C; and C2 are in the LHP, we get 


is f() —_ ,;, F@2) — FA) 
ay GaGa pte wage. +2ni ) ’Res, 
(11.15) 


where the plus sign is for the case where C; and C2 are in the UHP and the 
minus sign for the case where both are in the LHP. We can also obtain the 
result for the case where the two singularities coincide by taking the limit 
x1 — x2. Then the RHS of the last equation becomes a derivative, and we 
obtain 


pf LO ax =tinf' (1) + 201 Res. 


oo (X — x0)? 


Example 11.3.13 An expression encountered in the study of Green’s func- 
tions or propagators (which we shall discuss later in the book) is 


lee) eX dx 
99 X27 — ke?’ 


where k and ¢ are real constants. We want to calculate the principal value of 
this integral. We use Eq. (11.15) and note that for t > 0, we need to close 
the contour in the UHP, where there are no poles: 


pe gift ayy oe) git dy eikt _ e—ikt sin kt 
P are =P 7 =17 =—-T : 
po 2k —oo (« = k)(x +) 2k k 


When t < 0, we have to close the contour in the LHP, where again there are 
no poles: 


Oo lt aie 00 git ay _— e@ikt — ea ikt sin kt 
P 73 =P — = -1T = : 
oo Xk —oo (« — k)(x +) 2k k 


The two results above can be combined into a single relation: 


oo. ght dy sink|t| 
P oe =-I a 
—oo 
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11.4 Problems 


11.1 Evaluate each of the following integrals, for all of which C is the circle 


|z| =3. 
4z—3 e& 
(a) ) z(z — 2) Re (b) ) z(z — im) BG, 
2 
- o a $a 7 is 
eo cosh z F 
‘ ) pan 
sinh z 1 
(g) ) A dz. (h) ff ceos( =) dz. 
. dz 
@) ) 2(z+5) 


z 
(k) ) ad W fp Saz. 
Cc 


sinh 2z° cz 
dz edz 
® any ” expend 


11.2 Let h(z) be analytic and have a simple zero at z = Zo, and let g(z) be 
analytic there. Let f(z) = g(z)/hA(z), and show that 


= 8 (Zo) 
h'(z0) 


Res| f (<o)] 
11.3 Find the residue of f(z) = 1/cosz at each of its poles. 


11.4 Evaluate the integral de dx /[(x? + 1)(x? + 4)] by closing the contour 
(a) in the UHP and (b) in the LHP. 


11.5 Evaluate the following integrals, in which a and b are nonzero real 
constants. 


Oo Oxt e 1 
——_—.— d 
(a) [ x445x2+6 - 


© dx 
() [ at 


© cosax 
——— dx. 
“2 i G2 + B22 


oe. dx 
(e) [ (x? + 1)2(x? +2)" 


(b) f an 

0 6x4 4+5x241 
(a) ye cos x dx 
0 G2 Fa?) 202 +52) 


OO ax 
® [ (x2 + 1)?" 


CO 9x2] 
(h) 5 dx. 
0 Xx +1 
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: oO dx . e xdx 
@ [ (x2 4 a2)2 0) fe (2 4 4x + 13)2" 


3 oo 1.2 1 
(k) is = =“ dx. (1) ye 
i oe 0 x 
° —xcosx dx oo x sinx dx 
(m) 90 Be (n) ae Ser 
oo X* — 2x + 10 —oo X* — 2x +10 
(0) [ dx () i x2dx 
0 ————- , 
0 +i Pedy GFF 42G2 +25) 
© cosax co dx 
dx. ———. 
@ few oo fot 


11.6 Evaluate each of the following integrals by turning it into a contour 
integral around a unit circle. 


20 do 
(a) i =e 
90 3+4siné 


20 dé 
(b) i —— wherea>l. 
0 


a+cosé 
() fe do 
c —_—.. 
0 1+sin’6 
(d) / sa h b>0 
———~—  wherea,b>0. 
0 (atbcos26y2 2 
cos” 30 
(e) a 
5 — 4c0s20 
h +1. 
©) ie epee ce 
236d 
(g) ly = se ¢ where a 4 +1. 
1—2acos¢+a2 
2¢d 
(h) i a eee on 
1—2acos¢ + a2 


(i) [ tan(x +ia)dx whereaeR. 
0 


(j) iE eS? cos(nd — sing)d@ wheren € Z. 
0 


11.7 Evaluate the integral J = a ieee e** dx/(1+e*) for 0 <a < 1. Hint: 
Choose a closed (long) rectangle that encloses only one of the zeros of the 
denominator. Show that the contributions of the short sides of the rectangle 
are zero. 
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R+ib Rtib 


Fig. 11.8 The contour used in Problem 11.8 


11.8 Derive the integration formula im en cos(2bx) dx = Pie where 


b £0 by integrating the function e-* around the rectangular path shown in 
Fig. 11.8. 


11.9 Use the result of Example 11.3.12 to show that 6’(k) = 8(k). 


11.10 Find the principal values of the following integrals. 
oa sinx dx °° cosax 
(a) a ee (b) 5 dx wherea> 0. 
~o (4° +4)(« - 1) oo l+x 
(o) [. x COSX d (d) [. 1~cosx | 
c —.——_ dx. —.—_ dx. 
0 x2 —5x+6 ss x2 
11.11 Evaluate the following integrals. 
(a) i. x? — b* (sinax d (b) [ sinax d 
a —=— : —.—- dx. 
9 x2+52 x . 9 x(x2 +b?) = 


©  sinax © cos 2ax — cos 2bx 
(©) | _ SDAY gs (a) [ dx 
G: ae ph) 0 x? 


© sin? x dx © sind x dx 
Qo | == Qg | ==. 
0 0 x 
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The subject of complex analysis is an extremely rich and powerful area of 
mathematics. We have already seen some of this richness and power in the 
previous chapter. This chapter concludes our discussion of complex analysis 
by introducing some other topics with varying degrees of importance. 


12.1 Meromorphic Functions 


Complex functions that have only simple poles as their singularities are nu- 
merous in applications and are called meromorphic functions. In this sec- 
tion, we derive an important result for such functions. 

Suppose that f(z) has simple poles at {z ye 1, Where N could be in- 
finity. Then, assuming that z # z; for all j, and noting that the residue of 
Sf (&)/( — z) at E =z is simply f(z), the residue theorem yields 


2, f, fare Sro( 2), 
§=2; 


2ni Ch é-— 


where C’, is a circle containing the first n poles, and it is assumed that the 
poles are arranged in order of increasing absolute values. Since the poles of 
f are assumed to be simple, we have 


1 
Res( £8) = im 6 yt = lim [6 - 2) f®)] 
5=zj 7 


—2Z eS By Zj—Zt> 


1 rj 
= Zaz Relf Oleas, =—/ 


ZjrZ 


where r; is, by definition, the residue of f(&) at § = z;. Substituting in the 
preceding equation gives 


.o ” 
f= aa ee 


c,§-Z rhs ea 
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Taking the difference between this and the same equation evaluated at z = 0 
(assumed to be none of the poles),! we can write 


1 
fo-f0= 5, | aes ee =). 


£7 hj re 


If | f(€)| approaches a finite value as || — ov, the integral vanishes for an 
infinite circle (which includes all poles now), and we obtain what is called 
the Mittag-Leffler expansion of the meromorphic function f: 


: 1 1 
FO=f0+4Dri( = é =). (2.1) 


i 4j 


Now we let g be an entire function with simple zeros. We claim that (a) 
(dg /dz)/g(z) is a meromorphic function that is bounded for all values of z, 
and (b) its residues are all unity. To see this, note that g is of the form 


g(Z) = (z— z1)(Z — 22) -+- (Z— zn) f (2), 


where z1,...,Zy are all the zeros of g, and f is an analytic function that 
does not vanish anywhere in the complex plane. It is now easy to see that 


1 ae (z) 
aa 7 “> fo 


zz; 


This expression has both properties (a) and (b) mentioned above. Further- 
more, the last term is an entire function that is bounded for all C. Therefore, 
it must be a constant by Proposition 10.5.5. This derivation also verifies 
Eq. (12.1), which in the case at hand can be written as 


é _&@_¢@ Bh 
aq Be@) = Fe = 7706+ D( a -). 


whose solution is readily found and is given in the following 


Proposition 12.1.1 [f g is an entire function with simple zeros 
{Zj Ve then 


N 
g(z) = g(Oe* H(: = Z)ers where c= 
aj 


fell 


(dg/dz)|z=0 
g(0) 
G2) 
and it is assumed that z; #0 for all j. 


'This is not a restrictive assumption because we can always move our coordinate system 
so that the origin avoids all poles. 

One can “prove” this by factoring the simple zeros one by one, writing g(z) = (z — 
z1) fi(z) and noting that g(z2) = 0, with z2 ¥ z1, implies that f)(z) = (z — z2) f2(z), ete. 


12.2 Multivalued Functions 


(a) (b) 


Fig. 12.1 (a) The origin is a branch point of the natural log function. (b) zo is a branch 
point of f(z) if f(zo tre!) 4 f (zo trel O17) 


12.2. Multivalued Functions 

The arbitrariness, up to a multiple of 277, of the angle @ = arg(z) in z = re!® 
leads to functions that can take different values at the same point. Consider, 
for example, the function f(z) = ./z. Writing z in polar coordinates, we 
obtain f(z) = f(r, 0) = (re!?)!/? = /re!®/?. This shows that for the same 
z=(r,0) = (r,0 + 277), we get two different values, f(r,@) and f(r,0 + 
27) =—f(r, 0). 

This may be disturbing at first. After all, the definition of a function (map- 
ping) ensures that for any point in the domain a unique image is obtained. 
Here two different images are obtained for the same z. Riemann found a cure 
for this complex “double vision” by introducing what is now called Riemann 
sheets. We will discuss these briefly below, but first let us take a closer look 
at a prototype of multivalued functions. Consider the natural log function, 
Inz. For z= re’® this is defined as Inz = Inr + i0 = In|z| +i arg(z) where 
arg(z) is defined only to within a multiple of 27r; that is, arg(z) = 6 + 2nz, 
forn =0,+1,+2,.... 

We can see the peculiar nature of the logarithmic function by consider- 
ing a closed curve around the origin, as shown in Fig. 12.1(a). Starting at 
an arbitrary point z on the curve, we move counterclockwise, noticing the 
constant increase in the angle 0, until we reach the initial point. Now, the 
angle is 9 + 27. Thus, the process of moving around the origin has changed 
the value of the log function by 2777, i.e., (In Z) final — (IN Z) initial = 2777. Note 
that in this process z does not change, because 


i@+27) _ re? ez = re? = Zinitial- 


Zfinal = Fe 
Definition 12.2.1 A branch point of a function f :C — C is a complex 
number zg with the property that for any (small enough) closed curve C 
encircling zg and for any point z = zg + re’? on the curve, f (zo + re!?) F 
f (zo + rei Ot27)), 


Historical Notes 

Victor-Alexandre Puiseux (1820-1883) was the first to take up the subject of multival- 
ued functions. In 1850 Puiseux published a celebrated paper on complex algebraic func- 
tions given by f(u, z) =0, f a polynomial in uv and z. He first made clear the distinction 
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365 


366 


branch cut or simply 


a“ 


cut 
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between poles and branch points that Cauchy had barely perceived, and introduced the 
notion of an essential singular point, to which Weierstrass independently had called 
attention. Though Cauchy, in the 1846 paper, did consider the variation of simple mul- 
tivalued functions along paths that enclosed branch points, Puiseux clarified this subject 
too. 

Puiseux also showed that the development of a function of z about a branch point z =a 
must involve fractional powers of z — a. He then improved on Cauchy’s theorem on the 
expansion of a function in a Maclaurin series. By his significant investigations of many- 
valued functions and their branch points in the complex plane, and by his initial work on 
integrals of such functions, Puiseux brought Cauchy’s pioneering work in function theory 
to the end of what might be called the first stage. The difficulties in the theory of multiple- 
valued functions and integrals of such functions were still to be overcome. Cauchy did 
write other papers on the integrals of multiplevalued functions in which he attempted to 
follow up on Puiseux’s work; and though he introduced the notion of branch cuts (lignes 
d’arrét), he was still confused about the distinction between poles and branch points. This 
subject of algebraic functions and their integrals was to be pursued by Riemann. 
Puiseux was a keen mountaineer and was the first to scale the Alpine peak that is now 
named after him. 


Thus, z = 0 is a branch point of the logarithmic function. Studying the 
behavior of In(1/z) = — nz around z = 0 will reveal that the point “‘at infin- 
ity” is also a branch point of In z. If zo 4 0 is any other point of the complex 
plane, then choosing C to be a small loop, we get 


: ig ip 
In(zo + re'®) = in} co( 1 + <)| =Inzg+ in + <) 


Z0 Z0 
re? 
~Inzo + — _ forr < |zol. 
Z0 
It is now clear that In(zo + re?) = In(zp tre! +2”). We therefore conclude 
that any point of the complex plane other than the origin cannot be a branch 
point of the natural log function. 


12.2.1 Riemann Surfaces 


The idea of a Riemann surface begins with the removal of all points that lie 
on the line (or any other curve) joining two branch points. For In z this means 
the removal of all points lying on a curve that starts at z = 0 and extends all 
the way to infinity. Such a curve is called a branch cut, or simply a cut. 

Let us concentrate on Inz and take the cut to be along the negative half 
of the real axis. Let us also define the functions 


Sn(Z) = fn 8) 


=Inr+i(@+2nz) for—a7 <9<z;r>0; n=0,+1,..., 


so fn(z) takes on the same values for —z <6 < 7 that Inz takes in the range 
(2n — 1)a <0 < (Qn+ 1). We have replaced the multivalued logarithmic 
function by a series of different functions that are analytic in the cut z-plane. 

This process of cutting the z-plane and then defining a sequence of 
functions eliminates the contradiction caused by the existence of branch 
points, since we are no longer allowed to completely encircle a branch point. 
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cal 


22 


Fig. 12.2 A few sheets of the Riemann surface of the logarithmic function. The path C 
encircling the origin O ends up on the lower sheet 


A complete circulation involves crossing the cut, which, in turn, violates the 
domain of definition of f(z). 

We have made good progress. We have replaced the (nonanalytic) multi- 
valued function In z with a series of analytic (in their domain of definition) 
functions f,(z). However, there is a problem left: f,(z) has a discontinuity 
at the cut. In fact, just above the cut f, (7,7 — €) =Inr +i(m —€ + 2n7) 
with € > 0, and just below it f,(r, -—7 + €) =Inr+i(—m +€ + 2nz), so 
that 


lim| fin, -—6)-fi, -—0 +«)| = 27i. 


To cure this we make the observation that the value of f,,(z) just above 
the cut is the same as the value of fn+41(z) just below the cut. This sug- 
gests the following geometrical construction, due to Riemann: Superpose 
an infinite series of cut complex planes one on top of the other, each plane 
corresponding to a different value of n. The adjacent planes are connected 
along the cut such that the upper lip of the cut in the (n — 1)th plane is con- 
nected to the lower lip of the cut in the nth plane. All planes contain the two 
branch points. That is, the branch points appear as “hinges” at which all the 
planes are joined. With this geometrical construction, if we cross the cut, we 
end up on a different plane adjacent to the previous one (Fig. 12.2). 

The geometric surface thus constructed is called a Riemann surface; 
each plane is called a Riemann sheet and is denoted by Rj, for j = 
0,+1,+2,.... A single-valued function defined on a Riemann sheet is 
called a branch of the original multivalued function. 

We have achieved the following: From a multivalued function we have 
constructed a sequence of single-valued functions, each defined in a sin- 
gle complex plane; from this sequence of functions we have constructed a 
single complex function defined on a single Riemann surface. Thus, the log- 
arithmic function is analytic throughout the Riemann surface except at the 
branch points, which are simply the function’s singular points. 
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Fig. 12.3. The Riemann surface for f(z) = zi /2 


It is now easy to see the geometrical significance of branch points. 
A complete cycle around a branch point takes us to another Riemann sheet, 
where the function takes on a different form. On the other hand, a complete 
cycle around an ordinary point either never crosses the cut, or if it does, it 
will cross it back to the original sheet. 

Let us now briefly consider two of the more common multivalued func- 
tions and their Riemann surfaces. 


Example 12.2.2 (The function f(z) = z!/") The only branch points for the 
function f(z) = z!/" are z = 0 and the point at infinity. Defining f,(z) = 
ri/nei(@+2kn/n) for k =0,1,...,n —1 and 0 <6 < 2m and following the 
same procedure as for the logarithmic function, we see that there must be 
n Riemann sheets, labeled Ro, R1,..., Ryn—1, in the Riemann surface. The 
lower edge of R,—1 is pasted to the upper edge of Ro along the cut, which 
is taken to be along the positive real axis. The Riemann surface for n = 2 is 
shown in Fig. 12.3. 

It is clear that for any noninteger value of a the function f(z) = z* has a 
branch point at z = 0 and another at the point at infinity. For irrational a the 
number of Riemann sheets is infinite. 


Example 12.2.3 (The function f(z) = (z7 — 1)!/2) The branch points for 
the function f(z) = (z? — 1)!/? are at z} = +1 and zz = —1 (see Fig. 12.4). 
Writing z—1= rye and z+ 1 =re!™, we have 


f@ = (ref)? (rgef@) = Jryrze! 1 +82)/2 


The cut is along the real axis from z = —1 to z = +1. There are two Rie- 
mann sheets in the Riemann surface. Clearly, only cycles of 27 involving 
one branch point will cross the cut and therefore end up on a different sheet. 
Any closed curve that has both z; and z2 as interior points will remain en- 
tirely on the original sheet. 


The notion of branch cuts can be used to evaluate certain integrals that 
do not fit into the three categories discussed in Chap. 11. The basic idea is 
to circumvent the cut by constructing a contour that is infinitesimally close 
to the cut and circles around branch points. 


12.2 Multivalued Functions 


Z=-1 q=l 


Fig. 12.4 The cut for the function f(z) = (z2 — 1)!/2 is from z, to z2. Paths that circle 
only one of the points cross the cut and end up on the other sheet 


Fig. 12.5 The contour for the evaluation of the integrals of Examples 12.2.4 and 12.2.5 


Example 12.2.4 To evaluate the integral J = i x%dx/(x? + 1) for 
|| < 1, consider the complex integral J’ = fc 2%dz/(z* + 1) where C is 
as shown in Fig. 12.5 and the cut is taken along the positive real axis. To 
evaluate the contribution from Cr and C,, we let ep stand for either r or R. 
Then we have 


; (pele . 2% petleiate 
P= Geta Cm"). “sea 
Cp (pe'’)- + 1 0 pre!’ + | 


It is clear that since |a| < 1, J, > 0as p > O0or p> oo. 

The contributions from L; and L2 do not cancel one another because the 
value of the function changes above and below the cut. To evaluate these 
two integrals we have to choose a branch of the function. Let us choose that 
branch on which z® = ieee" for 0 < 6 < 27. Along L1,0 ~Oor z* = x%, 
and along L2, 0 © 2m or z* = (xe?"')”. Thus, 


zo co ya 0 x e2mia 
>—- dz= —_— d —.——_ d 
ban Z i x2 +1 rtf oo a 


[o@) a 
=(i- ae ] ae (12:3) 
0 Xx +1 
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The LHS of this equation can be obtained using the residue theorem. There 


are two simple poles, at z= +i and z = —i with residues Res[ f()] = 
(e!*/2)% /2i and Res[ f (—i)] = —(e!3"/*) /2i. Thus, 


f 7m jee 2ni( 7 ) = (eer /? _ gene), 
Cc 


2+ 2i 2i 


Combining this with Eq. (12.3), we obtain 


= sec 


[ xe m(elet/2 = ef3am/2) XT Out 
dx = ; = : 
9 x274+1 1 — e2zia 2 2 


If we had chosen a different branch of the function, both the LHS and the 
RHS of Eq. (12.3) would have been different, but the final result would still 
have been the same. 


Example 12.2.5 Here is another integral involving a branch cut: 


COO x 4 
1=| dx for0O<a<1l. 
o xt+i1 


To evaluate this integral we use the zeroth branch of the function and the 
contour of the previous example (Fig. 12.5). Thus, writing z = pe’’, we 
have 


g @ [o.@) relia z @ 
2m i Res} f(—1)| = a= [ d, +f dz 
[F¢ | b= o ptl e Cretl 


0 2im\—a —a 
Co tap + Sag. (10) 
oo petit +1 c.z+1 


The contributions from both circles vanish by the same argument used in 
the previous example. On the other hand, Res[ f(—1)] = (—1)“%. For the 
branch we are using, —1 = e'™. Thus, Res[ f(—1)] = e7~/¢”. The RHS of 
Eq. (12.4) yields 


CO 5-4 Oo ,_—a 
i p dp — gay p dp = (1 _ ead |S 
0 0 


pt+l 


It follows from (12.4) that (1 — e~2/"@) J = 2mie~'*4, or 


gE 1 
i dx=-— forO<a<l. 
o «+l sina 


Example 12.2.6 Let us evaluate J = {5° Inx dx/(x? + a”) with a > 0. We 
choose the zeroth branch of the logarithmic function, in which —z <6 <7z, 
and use the contour of Fig. 12.6. 

For Lj, z= pe'™ (note that p > 0), and for L2, z = pe. Thus, we have 
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Cr 


Fig. 12.6 The contour for the evaluation of the integral of Example 12.2.6 


Inz 


_ f£ Ine’) nap f Inz 
c 


a 66 (pe'™)? + a ; z2+a?2 


+f ae +f eg (12.5) 
ee Bee 
er ee pee 


where z = ia is the only singularity—a simple pole—in the UHP. Now we 
note that 


© Wipe) v2 =|" Ino +izx 
be (pe'™)? + a2 _ p? +a? 


°° Ine °° do 
= —>—— d LTC >+—. 
I paar I p? +a? 


The contributions from the circles tend to zero. On the other hand, 


R Wane Inz _InGa)_ 1 (: +15) 
as f(a) = ete @-iW@etia Wa al oy 


Substituting the last two results in Eq. (12.5), we obtain 


W W °c Ine co dp 
—{Inad+i—)=2 —>—, d i70 ——e 
x I p+ ane tt l p> +a? 


It can also easily be shown that i dp/(p” +a*)= wz /(2a). Thus, in the 
limit € + 0, we get J = — Ina. The sign of a is irrelevant because it ap- 


a 
pears as a square in the integral. Thus, we can write 


- Inx d 1 In|al 40 
= nial, . 
pee 8a 
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12.3 Analytic Continuation 


Analytic functions have certain unique properties, some of which we have 
already noted. For instance, the Cauchy integral formula gives the value of 
an analytic function inside a simple closed contour once its value on the 
contour is known. We have also seen that we can deform the contours of 
integration as long as we do not encounter any singularities of the function. 

Combining these two properties and assuming that f : C > C is analytic 
within a region S C C, we can ask the following question: Is it possible to 
extend f beyond S? We shall see in this section that the answer is yes in 
many cases of interest.* First consider the following: 


Theorem 12.3.1 Let fi, fo: C— C be analytic in a region S. If fi = fa 
in a neighborhood of a point z € S, or for a segment of a curve in S, then 
fi=fhoforallzés. 


Proof Let g= fi — fo, and U = {z € S | g(z) = 0}. Then U is a subset of 
S that includes the neighborhood of z (or the line segment) in which f; = 
J2. Tf U is the entire region S, we are done. Otherwise, U has a boundary 
beyond which g(z) 4 0. Since all points within the boundary satisfy g(z) = 
0, and since g is continuous (more than that, it is analytic) on S, g must 
vanish also on the boundary. But the boundary points are not isolated: Any 
small circle around any one of them includes points of U as well as points 
outside U. Thus, g must vanish on a neighborhood of any boundary point, 
implying that g vanishes for some points outside U. This contradicts our 
assumption. Thus, U must include the entire region S. 


A consequence of this theorem is the following corollary. 


Corollary 12.3.2 The behavior of a function that is analytic in a region 
S C C is completely determined by its behavior in a (small) neighborhood 
of an arbitrary point in that region. 


This process of determining the behavior of an analytic function outside 
the region in which it was originally defined is called analytic continu- 
ation. Although there are infinitely many ways of analytically continuing 
beyond regions of definition, the values of all functions obtained as a result 
of diverse continuations are the same at any given point. This follows from 
Theorem 12.3.1. 

Let fi, fo: C > C be analytic in regions S$; and So, respectively. Sup- 
pose that f; and f2 have different functional forms in their respective re- 
gions of analyticity. If there is an overlap between S; and S2 and if fi = fo 
within that overlap, then the (unique) analytic continuation of f; into S2 
must be f2, and vice versa. In fact, we may regard f; and f2 as a single 


3Provided that S is not discrete (countable). (See [Lang 85, p. 91].) 
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Fig. 12.7 The function defined in the smaller circle is continued analytically into the 
larger circle 


function f :C — C such that 


fitz) when ze Sj, 
f(z) when z€ So. 


f@= 


Clearly, f is analytic for the combined region S = S; U Sz. We then say that 
f, and f2 are analytic continuations of one another. 


Example 12.3.3 Consider the function f)(z) = )°)-.) 2”, which is analytic 

for |z| < 1. We have seen that it converges to 1/(1 — z) for |z| < 1. Thus, we 

have f(z) = 1/(1 — z) when |z| < 1, and f{ is not defined for |z| > 1. 
Now let us consider a second function, 


oo 3 a+l a\r 
fic=)-(5) (c+ >) ; 
n=0 
which converges for |z + | < 3. To see what it converges to, we note that 
3—)3 2h" 
AD=Fs aE + a] 


Thus, 
3 
5 


= when 
1-3@+9) \-z 


A@= 


(28 
=|<nr-. 
ae i 


We observe that although f(z) and f2(z) have different series representa- 
tions in the two overlapping regions (see Fig. 12.7), they represent the same 
function, f(z) = 1/(1 — z). We can therefore write 


J fi) when |z| <1, 
fa(z) when |z + $| < 3, 


f@) 
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5; 


Fig. 12.8 The functions f; and f> are analytic continuations of each other: f; analyt- 
ically continues f2 into the right half-plane, and f2 analytically continues f| into the 
semicircle in the left half-plane 


and f; and fo are analytic continuations of one another. In fact, f(z) = 
1/(1 — z) is the analytic continuation of both f; and fo for all of C except 
z= 1. Figure 12.7 shows S;, the region of definition of f;, for i = 1, 2. 


Example 12.3.4 The function f|(z) = i e “dt exists only if Re(z) > 0, 
in which case f1(z) = 1/z. Its region of definition S$; is shown in Fig. 12.8 
and is simply the right half-plane. 

Now we define f2 by a geometric series: f2(z) =i > ol(z + i)/i]” 
where |z+i| < 1. This series converges, within its circle of convergence Sp, 
to 


1 1 


TSGeDh  & 


Thus, we have 


1 J fi@) whenze S,, 


Zz fo(z) whenze Sp. 


The two functions are analytic continuations of one another, and f(z) = 1/z 
is the analytic continuation of both f| and f2 for all z € C except z = 0. 


12.3.1 The Schwarz Reflection Principle 


A result that is useful in some physical applications is referred to as a dis- 
persion relation. To derive such a relation we need to know the behavior 
of analytic functions on either side of the real axis. This is found using the 
Schwarz reflection principle, for which we need the following result. 


Proposition 12.3.5 Let fj be analytic throughout S;, where i = 1,2. Let B 
be the boundary between S, and Sy (Fig. 12.9) and assume that f; and fr 
are continuous on B and coincide there. Then the two functions are analytic 
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(a) (b) 


Fig. 12.9 (a) Regions S; and Sz separated by the boundary B and the contour C. (b) The 
contour C splits up into C; and C2 


continuations of one another and together they define a (unique) function 


_ | fi) whenze SUB, 


i fo(z) whenz€ S.UB, 


which is analytic throughout the entire region S, U S2 U B. 


Proof The proof consists of showing that the function integrates to zero 
along any closed curve in S$; U Sz U B. Once this is done, one can use Mor- 
era’s theorem to conclude analyticity. The case when the closed curve is 
entirely in either Sj or S2 is trivial. When the curve is partially in S; and 
partially in Sj the proof becomes only slightly more complicated, because 
one has to split up the contour C into Cy and C2 of Fig. 12.9(b). The details 
are left as an exercise. 


Theorem 12.3.6 (Schwarz reflection principle) Let f be a function that is Schwarz reflection 
analytic in a region S that has a segment of the real axis as part of its principle 
boundary B. If f (z) is real whenever z is real, then the analytic continuation 

g of f into S* (the mirror image of S with respect to the real axis) exists 

and is given by 


g(z)= (F(z*))* = f*(z*), whereze S*. 
Proof First, we show that g is analytic in S*. Let 
f(z) =u(x, y) tiv, y), g(z) =U (x,y) +iV(x, y). 


Then f(z*) = f(x, -y) =u(x, —y) + iv, —y) and g(z) = f*(z*) imply 
that U(x, y) =u(x, —y) and V(x, y) = —v(x, —y). Therefore, 


dU _ du dv dv _ OV 
ax ax dy a(—y) dy’ 
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Fig. 12.10 The contour used for dispersion relations 


aU du dv aV 
dy dy ax ox” 


These are the Cauchy-Riemann conditions for g(z). Thus, g is analytic. 

Next, we note that f(x, 0) = g(x, 0), implying that f and g agree on the 
real axis. Proposition 12.3.5 then implies that f and g are analytic continu- 
ations of one another. 


It follows from this theorem that there exists an analytic function h such 
that 
f(z) whenzeS, 


h(z)= 
o g(z) whenzeS*. 


We note that (z*) = g(z*) = f*(z) =h*(z). 


12.3.2 Dispersion Relations 


Let f be analytic throughout the complex plane except at a cut along the 
real axis extending from xo to infinity. For a point z not on the x-axis, the 
Cauchy integral formula gives f(z) = (2mi)~! fe Ff (&) dé /(€ — z) where C 
is the contour shown in Fig. 12.10. 

We assume that f drops to zero fast enough that the contribution from 
the large circle tends to zero. The reader may show that the contribution 
from the small half-circle around xg also vanishes. Then 


_ 7S eee Te) 
ie 7 Qni if. E Zz a xo—le E a4 as] 
= =| eee) dx ie A dai ax. 
2miLJxy xX —Ztie xo * —Z—1€ 


Since z is not on the real axis, we can ignore the ie terms in the denomina- 
tors, so that f(z) = (2i)7! i We + ie) — f(x —ie)|]dx/(x — z). The 
Schwarz reflection principle in the form f*(z) = f(z*) can now be used to 
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yield 
f(xtie) — f(x -ie) = fxtie) — f*(x tie = 2i Im| f (x + ie)]. 


The final result is 


fo= | Im[f@ +ie)] 5 


‘0 x= % 


This is one form of a dispersion relation. It expresses the value of afunction dispersion relation 
at any point of the cut complex plane in terms of an integral of the imaginary 
part of the function on the upper edge of the cut. 
When there are no residues in the UHP, we can obtain other forms of 
dispersion relations by equating the real and imaginary parts of Eq. (11.11). 
The result is 


Re| f (xo) =+oP[ oa 5. 


4 —co «XO 


Im[f(x0)] =4—P i Ref yy 


—co * —X0 
where the upper (lower) sign corresponds to placing the small semicircle 
around xg in the UHP (LHP). The real and imaginary parts of f, as related 
by Eq. (12.6), are sometimes said to be the Hilbert transform of one an- Hilbert transform 
other. 
In some applications, the imaginary part of f is an odd function of its 
argument. Then the first equation in (12.6) can be written as 


© x Im[f(x)] ae 


2_ +2 
Xo 


2 
Re[ f (x0) =s<P | 


x 
To arrive at dispersion relations, the following condition must hold: 
lim R| f(Re’’)| =0, 
Jim R|f(Re)| 


where R is the radius of the large semicircle in the UHP (or LHP). If f does 

not satisfy this prerequisite, it is still possible to obtain a dispersion relation 

called a dispersion relation with one subtraction. This can be done by dispersion relation with 
introducing an extra factor of x in the denominator of the integrand. We one subtraction 

start with Eq. (11.15), confining ourselves to the UHP and assuming that 

there are no poles there, so that the sum over residues is dropped: 


Paty ef f(x) 


x2 — X1 im oo (KX — X1)(x — x2) 


The reader may check that by equating the real and imaginary parts on both 
sides, letting x; = 0 and x2 = xo, and changing x to — x in the first half of 
the interval of integration, we obtain 


Rel f (xo) 


x0 


_ Rel fO)] +i]e fo Im[ f(—x)] ax+ Pf Im[ f(x)] ax. 
0 0 


XO Tu x(x + x9) x(x — xo) 
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For the case where Im[ f(—x)] = — Im[/ (x)], this equation yields 


Xa p [* Imboo) 


Rel f (xo) | =R (0)) + “oP d (12.7) 
e[ f(x0)] =Re[f)] + — 5 aa 


Example 12.3.7 In optics, it has been shown that the imaginary part of the 
forward-scattering light amplitude with frequency @ is related, by the so- 
called optical theorem, to the total cross section for the absorption of light 
of that frequency: 


Im[ f(o)] = [—orw(o). 


Substituting this in Eq. (12.7) yields 


2 (oe) 
Re[ f(wo)] =Re[ f(0)] + —& P / = do. (12.8) 
0 


2 
20 wo — WG 


Thus, the real part of the (coherent) forward scattering of light, that is, the 
real part of the index of refraction, can be computed from Eq. (12.8) by 
either measuring or calculating ojot(@w), the simpler quantity describing the 
absorption of light in the medium. Equation (12.8) is the original Kramers- 
Kronig relation. 


12.4 The Gamma and Beta Functions 


We have already encountered the gamma function. In this section, we de- 
rive some useful relations involving the gamma function and the closely 
related beta function. The gamma function is a generalization of the facto- 
rial function—which is defined only for positive integers—to the system of 
complex numbers. By differentiating the integral 


1a@)= f° ear= 1/a 
0 


with respect to @ repeatedly and setting a = 1 at the end, we get 
do t"e ‘dt =n!. This fact motivates the generalization 


Co 
ras f t'e'dt for Re(z) > 0, (12.9) 
0 


where I is called the gamma (or factorial) function. It is also called Euler’s 
integral of the second kind. It is clear from its definition that 


lan+1)=n! (12.10) 


if n is a positive integer. The restriction Re(z) > 0 assures the convergence 
of the integral. 
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An immediate consequence of Eq. (12.9) is obtained by integrating it by 
parts: 


(c+ l= (2. (12.11) 


This also leads to Eq. (12.10) by iteration and the fact that [(1) = 1. 
Another consequence is the analyticity of (z). Differentiating Eq. (12.11) 
with respect to z, we obtain 


aV(z+1) _ dT (z) 
ig orga 


Thus, dI(z)/dz exists and is finite if and only if dI'(z + 1)/dz is finite 
(recall that z 4 0). The procedure of showing the latter is outlined in Prob- 
lem 12.16. Therefore, '(z) is analytic whenever I'(z + 1) is. To see the 
singularities of '(z), we note that 


Vizgtn)=2(24+ 1)(z+2)--- (+n —- IIT (z), 


or 

= T(z+n) 

— £Z+DE+2)--G+n-1) 
The numerator is analytic as long as Re(z + 7) > 0, or Re(z) > —n. 


Thus, for Re(z) > —n, the singularities of I'(z) are the poles at z = 
0,-1,-2,...,--n + 1. Since n is arbitrary, we conclude that 


T(z) 


(12,12) 


Box 12.4.1 I'(z) is analytic at all z € C except at z=0, —1, —2,..., 
where 1'(z) has simple poles. 


A useful result is obtained by setting z = 5 in Eq. (12.9): 


r(>) =a/r. (12.13) 


This can be obtained by making the substitution u = ./f in the integral. 

We can derive an expression for the logarithmic derivative of the gamma 
function that involves an infinite series. To do so, we use Eq. (12.2) noting 
that 1/1 (z+ 1) is an entire function with simple zeros at {—k}po ,- Equa- 


tion (12.2) gives 
ee =e" T](14 jew" 
PEt) k=1 i j 
where y is a constant to be determined. Using Eq. (12.11), we obtain 
as =2eT] (1+ ze (12.14) 
P'(z) k 


k=1 
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To determine y, let z = 1 in Eq. (12.14) and evaluate the resulting prod- 
uct numerically. The result is y = 0.57721566..., the so-called Euler- 
Mascheroni constant. 

Differentiating the logarithm of both sides of Eq. (12.14), we obtain 


d 1 ar 1 
pels lea) (12.15) 


Other properties of the gamma function are derivable from the results 
presented here. Those derivations are left as problems. 

The beta function, or Euler’s integral of the first kind, is defined for 
complex numbers a and b as follows: 


1 
B(a, b) =| 1?! —1)’-!dt where Re(a),Re(b)>0. (12.16) 
0 
By changing ¢ to 1/t, we can also write 
foe) 
Bia, b) =/ og eo Dame (12.17) 
1 
Since 0 <t < 1 in Eq. (12.16), we can define 6 by t = sin” 6. This gives 
m/2 
Blab) =2 f sin?! 6 cos”?-! 6 dé. (12.18) 
0 


This relation can be used to establish a connection between the gamma and 
beta functions. We note that 


[oe Cc 2 
ra= | tear =2 f x8 "oF dx, 
0 0 


where in the last step we changed the variable to x = ./t. Multiply (a) by 
I'(b) and express the resulting double integral in terms of polar coordinates 
to obtain (a) (b) = (a + b) B(a, b), or 

l(a) (b) 


B(a, b) = B(b,a) = eeaaE (12.19) 


Let us now establish the following useful relation: 


P@rd-z= (12.20) 


sinz. 
With a = z and b = | —z, and using u = tan, Eqs. (12.18) and (12.19) give 


y2z-! 


a for 0 < Re(z) < 1. 


[o,@) 
r@rd=2)=Be1-9=2f 
0 
Using the result obtained in Example 12.2.4, we immediately get Eq. (12.20), 
valid for 0 < Re(z) < 1. By analytic continuation we then generalize 
Eq. (12.20) to values of z for which both sides are analytic. 
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Im(t) 


Re(t) 


Fig. 12.11 The contour C used in evaluating the reciprocal gamma function 


Example 12.4.2 As an illustration of the use of Eq. (12.20), let us show 
that '(z) can also be written as 


eee [ea (12.21) 
T(z) 2niJdct® ’ 


where C is the contour shown in Fig. 12.11. From Eqs. (12.9) and (12.20) it 
follows that 


1 sin 7 sin ve 
= aa wal Z= < | er “dr 
0 


The contour integral of Eq. (12.21) can be evaluated by noting that above 
the real axis, f = re'” = —r, below it t = re~'” = —r, and, as the reader 
may check, that the contribution from the small circle at the origin is zero; 
so 


et e i er ; we [ er ( : 
= = er = (eer 
cE fy Gem wo FEE 


. CO e' : Co e! 
=-e (7% dr+eé7% — dr. 
a 2 0 re 


Comparison with the last equation above yields the desired result. 


Another useful relation can be obtained by combining Eqs. (12.11) and 
(12.20): 


(Pd — z) =T(z)(—z)F (-z) = 


sinaz 
Thus, 


(Zl (z)=- 


——_ (12.22) 
ZSINTTZ 


Once we know I(x) for positive values of real x, we can use Eq. (12.22) 
to find T(x) for x < 0. Thus, for instance, T(5) = ./m gives T(-4) = 
—2.,/m. Equation (12.22) also shows that the gamma function has simple 
poles wherever z is a negative integer. 


12 Advanced Topics 


12.5 Method of Steepest Descent 


It is shown in statistical mechanics ([Hill 87, pp. 150—152]) that the partition 
function, which generates all the thermodynamical quantities, can be written 
as a contour integral. Debye found a very elegant technique of approximat- 
ing this contour integral, which we investigate in this section. Consider the 
integral 


I(a)= / ef 9(z) dz (12.23) 
Cc 


where |a@| is large and f and g are analytic in some region of C containing 
the contour C. Since this integral occurs frequently in physical applications, 
it would be helpful if we could find a general approximation for it that is 
applicable for all f and g. The fact that |@| is large will be of great help. By 
redefining f(z), if necessary, we can assume that a = ale!) is real and 
positive [absorb e!*"8 into the function f(z) if need be]. 

The exponent of the integrand can be written as 


af (z) =au(x, y) +iav(x, y). 


Since @ is large and positive, we expect the exponential to be the largest at 
the maximum of u(x, y). Thus, if we deform the contour so that it passes 
through a point zo at which u(x, y) is maximum, the contribution to the 
integral may come mostly from the neighborhood of zo. This opens up the 
possibility of expanding the exponent about zo and keeping the lowest terms 
in the expansion, which is what we are after. There is one catch, however. 
Because of the largeness of a, the imaginary part of af in the exponent 
will oscillate violently as v(x, y) changes even by a small amount. This 
oscillation can make the contribution of the real part of f(zo) negligibly 
small and render the whole procedure useless. Thus, we want to tame the 
variation of exp[iv(x, y)] by making v(x, y) vary as slowly as possible. 
A necessary condition is for the derivative of v to vanish at zg. This and the 
fact that the real part is to have a maximum at zg lead to 
ane lee =0. (12.24) 
Ox dx dz|,, 
However, we do not stop here but demand that the imaginary part of f be 
constant along the deformed contour: Im[f(z)] = Im[f(zo)] or v(@, y) = 
v(x0, Yo). 
Equation (12.24) and the Cauchy-Riemann conditions imply that du/dx = 
0 = du/dy at zo. Thus, it might appear that zo is a maximum (or minimum) 
of the surface described by the function u(x, y). This is not true: For the 
surface to have a maximum (minimum), both second derivatives, 7u/ ax? 
and d*u/dy~, must be negative (positive). But that is impossible because 
u(x, y) is harmonic—the sum of these two derivatives is zero. Recall that 
a point at which the derivatives vanish but that is neither a maximum nor 
saddle point 4 minimum is called a saddle point. That is why the procedure described 
approximation below is sometimes called the saddle point approximation. 


12.5 Method of Steepest Descent 


We are interested in values of z close to zo. So let us expand f(z) in 
a Taylor series about zo, use Eq. (12.24), and keep terms only up to the 
second, to obtain 


f@) = f (zo) + ste — zo)” f" (zo). (12.25) 
Let us assume that f”(zo) 4 0, and define 
z—z=rje" and ; f(z) = me™ (12.26) 
and substitute in the above expansion to obtain 


F( — Fo) =rp ree! CAF), (12.27) 


or 
Re[ f(z) — f(zo)] = r7r2 cos(20) + 62), 
Im[ f(z) — f (zo)] = r7r2 sin(26) + 42). 


The constancy of Im[f(z)] implies that sin(20; + 62) = 0, or 20; + 02 =nz. 
Thus, for 6; = —62/2 + na /2 where n = 0, 1, 2,3, the imaginary part of f 
is constant. The angle 62 is determined by the second equation in (12.26). 
Once we determine n, the path of saddle point integration will be specified. 

To get insight into this specification, consider z — zy = rye! (—2/2+"7/?) , 
and eliminate r; from its real and imaginary parts to obtain 


(12.28) 


This is the equation of a line passing through zo = (xo, yo) and making 
an angle of 6; = (nz — 62)/2 with the real axis. For n = 0,2 we get one 
line, and for n = 1,3 we get another that is perpendicular to the first (see 
Fig. 12.12). It is to be emphasized that along both these lines the imaginary 
part of f(z) remains constant. To choose the correct line, we need to look at 
the real part of the function. Also note that these “lines” are small segments 
of (or tangents to) the deformed contour at zo. 

We are looking for directions along which Re(f) goes through a relative 
maximum at zo. In fact, we are after a path on which the function decreases 
maximally. This occurs when Re[ f (z)]— Rel_f (zo)] take the largest negative 
value. Equation (12.28) determines such a path: It is that path on which 
cos(26; + 623) = —1, or when n = 1,3. There is only one such path in the 
region of interest, and the procedure is uniquely determined.* Because the 
descent from the maximum value at zp is maximum along such a path, this 
procedure is called the method of steepest descent. 


4The angle 6) is still ambiguous by 7, because n can be | or 3. However, by a suitable 
sign convention described below, we can remove this ambiguity. 


method of steepest 
descent 
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Fig. 12.12 A segment of the contour Co in the vicinity of zo. The lines mentioned in the 
text are small segments of the contour Co centered at zo 


Now that we have determined the contour, let us approximate the integral. 
Substituting 20; + 62 = 7, 3m in Eq. (12.27), we get 


1 
Ff — fo) =—rn = P= 5G zo)? f" (zo). (12.29) 


Using this in Eq. (12.23) yields 


T(a) xX / eXLf @0)-1'] g(z) dz= ay eg (z) dz, (12.30) 
Co 


Co 


where Co is the deformed contour passing through Zo. 
To proceed, we need to solve for z in terms of ¢. From Eq. (12.29) we 
have 


2 
a) t 


t= 
f" Zo) r2 


Therefore, |z—zo| = |t|/./72, or z— zo = (|t|/,/r2)e", by the first equation 
of (12.26). Let us agree that for t > 0, the point z on the contour will move in 
the direction that makes an angle of 0 < 6; < 7, and that t < 0 corresponds 
to the opposite direction. This convention removes the remaining ambiguity 
of the angle 6), and gives 


—i02 


(z—z0)* = 


t . 
z=z+— el, 0<6 <7. (12.31) 


Jr2 
Using the Taylor expansion of g(z) about zg, we can write 
_ tn ind, .(n) en 
g()dz= pe pH, 8 «| wi 


[o,e) t” 
a2 aon ee 
nao" n! 


12.5 Method of Steepest Descent 


and substituting this in Eq. (12.30) yields 


[o@) 
api? i" i(nt+1)0 
Hayes | aa {> (+1/2 ou 1 g™ (zo) t dt 
Co n—0 ly Nn! 
(n+1)61 ec) 2 
SD eco cc @) (zo) / en dt, (12.32) 
L —0o 


The extension of the integral limits to infinity does not alter the result sig- 
nificantly because a is assumed large and positive. The integral in the sum 
is zero for odd n. When n is even, we make the substitution uv = at? and 
show that 


[o@) 
/ ee! Nt = a "TDP TT(n + 1)/2]. 
—o0o 


With n = 2k, and using r2 = | f’’(zo)|/2, the sum becomes 


OO gk +1/2 gi (2k+1)01 


I(a)~ we ef (Zo) 2 Feo 2a 


1 
g7 (zo) I (i & er 


(12.33) 
This is called the asymptotic expansion of /(@). In most applications, only 
the first term of the above series is retained, giving 


2a saat 


/| Lf” (zo)| 


Example 12.5.1 Let us approximate the integral 


I (a) © e&%f 0) (12.34) 


CO 
I(a)=T(a+1)= / e °z%dz, 
0 


where a is a positive real number. First, we must rewrite the integral in the 
form of Eq. (12.23). We can do this by noting that z¥ = e%!*. Thus, we 


have 
[o@) CO 
I(a) = ; eemetdz = : erline2/) dz, 
0 0 


and we identify f(z) = Inz — z/a and g(z) = 1. The saddle point is found 
from f’(z) =0 or zo =a. Furthermore, from 


1 " 1 1 1 10 
pf ISG) oe 


and 26, + 62 = 7, 37, as well as the condition 0 < 6; < 2, we conclude that 
6; =0. 
Substitution in Eq. (12.34) yields 


20 1 
To +1) % ef @0), | 
a ./1/a2 
= JS 2raet MD = V/A et Qt? (12.35) 
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Fig. 12.13 The contour for the evaluation of the Hankel function of the first kind 


which is called the Stirling approximation. For a = n, a positive integer, 
this yields the useful result 


nla Jn ent tl/2 
with the approximation getting better and better for larger and larger n. 


Example 12.5.2 The Hankel function of the first kind is defined as 


1 = dz 
HPa@)=— f om Va 


where C is the contour shown in Fig. 12.13. 

We want to find the asymptotic expansion of this function, choosing the 
branch of the function in which —z7 <6 <7. 

We identify f(z) = 5(Z — 1/z) and g(z) = z—!. Next, the stationary 
points of f are calculated: 


a + : 0 => j 

—=-+->5= = +i. 

dz 2 2? a 
The contour of integration suggests the saddle point zo = +i. The sec- 
ond derivative evaluated at the saddle point gives f’” (zo) = —1/ za =-i= 
e 7/2. or @, = —7/2. This, and the convention 0 < 6; < z, force us to 


choose 6; = 37/4. Substituting this in Eq. (12.34) and noting that f(i) =i 
and | f’’(zo)| = 1, we obtain 


HYG) = eae J wi [2% ji3n/a,—v—1 = [2 ji(e—va 2-2/4), 
lv 17v Qa QI 


where we have used i~’—! =e! 47/2, 


Although Eq. (12.34) is adequate for most applications, we shall have 
occasions to demand a better approximation. One may try to keep higher- 
order terms of Eq. (12.33), but that infinite sum is in reality inconsistent. 
The reason is that in the product g(z) dz, we kept only the first power of 
t in the expansion of z. To restore consistency, let us expand z(f) as well. 


12.5 Method of Steepest Descent 387 


Suppose 


(oe) CO 
Z—Z0= > bmt™ => dz= > (m+ l)bmsit™dt, 
m=1 m=0 
so that 
ioe) 1” ee) 
g(z)dz= ¥ 7 ei? gl) (zy) Si (m +) bmyit dt 
n=0 "2 n} m=0 
oo - 
= YS. mF Ybmgig™ (ot dt. 
m,n=0 ry a! 


Now introduce / = m + n and note that the summation over n goes up to /. 


This gives 
ein oo 
ass pdm + Db nig (zo) t'dt =) ayt'dt. 
1=0 n=0 12 1=0 


=a) 


Substituting this in Eq. (12.30) and changing the contour integration into the 
integral from —oo to oo as before yields 


CO 
1 
I ~ e&f (Zo) oO Ra ; 
(a) Ye Saree + 5 
ees (12.36) 
2k in, 


an =) Ok = + box 418 (Zo) 
nao" 1 
The only thing left to do is to evaluate b,,. We shall not give a general 
formula for these coefficients. Instead, we shall calculate the first three of 
them. This should reveal to the reader the general method of approximating 
them to any order. We have already calculated b in Eq. (12.31). To calculate 
b>, keep the next-highest term in the expansion of both z and t?. Thus write 


1 1 
Z—-z2=bitt+bt?, P= =s feo z0)? — gf" ove —2z)?. 


Now substitute the first equation in the second and equate the coefficients of 
equal powers of t on both sides. The second power of f gives nothing new: 
It merely reaffirms the value of b;. The coefficient of the third power of t¢ is 
—b, bo f" (zo) — zb3 f’ (zo). Setting this equal to zero gives 


bit” @@) _ f" (Zo) Ai 
6f"(zo) 31 f(z) I? 
where we substituted for b; from Eq. (12.31) and used 20; + 6. = 7. 


To calculate b3, keep one more term in the expansion of both z and ¢* to 
obtain 


by = ; (12.37) 


zZ— zy = bit + bot? + b3t? 
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Fig. 12.14 Contour used for Problem 12.4 


and 


P= SF" eon =£0) = SFM eoNle =") = at” (zo) (z — z0)*. 
Once again substitute the first equation in the second and equate the coef- 
ficients of equal powers of t on both sides. The second and third powers of 
t give nothing new. Setting the coefficient of the fourth power of t equal to 
zero yields 


— oi Seat fm) | 
oO] " 3 7] 
T2Lf" (zo)? -24,f"(z0) 
- Jf 23141 beat Faad | 
~ 121 f"ZoB/ 3LF"ol® (zo) J” 


(12.38) 


12.6 Problems 
12.1 Derive Eq. (12.2) from its logarithmic derivative. 


12.2 Show that the point at infinity is not a branch point for f(z) = (2? — 
11/2, 


12.3 Find the following integrals, for which O Aa ER. 
(a) i co nx d (b) / ~ Inx d 
a x, x, 
0 (x* +a)? 0 (x2 +a?)?./x 
©) [ (Inx)? i 
c x 
0 x2 +a? 
12.4 Use the contour in Fig. 12.14 to evaluate the following integrals. 


© sinax © x cosax 
(a) : dx, (b) : dx 
go  sinhx 9  sinhx 


12.5 Show that " f(sin@) dé = 2 f°”? f (sind) dé for an arbitrary func- 


tion f defined in the interval [—1,+1]. 


12.6 Problems 


12.6 Find the principal value of the integral ries oe x sinx dx /(x? — %5) and 
evaluate 


oe x sinx 
| - — dx 
oo (X — xp Hie) (x +x9 tie) 


for the four possible choices of signs. 


12.7 Use analytic continuation, the analyticity of the exponential, hyper- 
bolic, and trigonometric functions, and the analogous identities for real z to 
prove the following identities. 


(a) e* =coshz-+sinhz, (b) cosh* z — sinh*z = 1, 


(c) sin2z=2sinzcosz. 


12.8 Show that the function 1/z* represents the analytic continuation into 
the domain C — {0} (all the complex plane minus the origin) of the function 
defined by ye o(n + 1)(z+ 1)” where |z + 1| < 1. 


12.9 Find the analytic continuation into C — {i, —i} (all the complex plane 
except i and —i) of f(z) = fence —*! sint dt where Re(z) > 0. 


12.10 Expand f(z) = yA z” (defined in its circle of convergence) in a 
Taylor series about z = a. For what values of a does this expansion permit 
the function f(z) to be continued analytically? 


12.11 The two power series 


OO on 


=) 
A@=> — and On ane ype 


n=1 n=1 


—_ 


have no common domain of convergence. Show that they are nevertheless 
analytic continuations of one another. 


12.12 Prove that the functions defined by the two series 


1 (l—a)z (1—a)?22 


{baz ae boas and 
1-z (1-2z) (1-—z)? 


are analytic continuations of one another. 


12.13 Show that the function f;(z) = 1/(z? + 1), where z # +i, is the ana- 
lytic continuation into C — {i, —i} of the function fo(z) = Soe, Cie 
where |z| < 1. 


12.14 Find the analytic continuation into C — {0} of the function 


lee) 
f(@) = te “‘dt where Re(z) > 0. 
0 
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12.15 Show that the integral in Eq. (12.9) converges. Hint: First show that 
Tc+)|< i, t*e—'dt where x = Re(z). Now show that 


oo 1 oo 
/ fe dt = i t*e ‘dt +f t"e ‘dt for some integer n > 0 
0 0 0 
and conclude that T(z) is finite. 


12.16 Show that dI'(z + 1)/dz exists and is finite by establishing the fol- 
lowing: 


(a) |Int| <t+1/t fort > 0. Hint: Fort > 1, show that t — Int is a mono- 
tonically increasing function. For t < 1, make the substitution t = 1/s. 

(b) Use the result from part (a) in the integral for dI'(z + 1)/dz to show 
that |dI'(z + 1)/dz| is finite. Hint: Differentiate inside the integral. 


12.17 Derive Eq. (12.11) from Eq. (12.9). 


12.18 Show that P(4) = /7, and that 


(2k — 1)!! = (2k — 1)(2k 5-312 r( AH) 
= == 


12.19 Show that P(z) = fy [In(1/t)&~!dt with Re(z) > 0. 
12.20 Derive the identity [5° e*” dx =T[(a + 1)/a]. 


12.21 Consider the function f(z) = (1+ z)*. 


(a) Show that d” f/dz"|,-9 = T(a+1)/P(@—n+1), and use it to derive 
the relation 


(ee) 


d+z)*%= 2 (*) z", where 


n=0 
a\ a! _ T(a+1) 
(*) ~anla—n)! ntl (a—n+1) 


(b) Show that for general complex numbers a and b we can formally write 


[ee 


(a +b)* = a ("Jaron 


n=0 


(c) Show that if @ is a positive integer m, the series in part (b) truncates at 
n=m. 


12.22 Prove that the residue of '(z) at z= —k is rp = (—1)*/kl. Hint: Use 
Eq. (12.12) 
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12.23 Derive the following relation for z = x + iy: 


2 —1/2 
IP@)| =r] [i+ z od 


12.24 Using the definition of B(a, b), Eq. (12.16), show that B(a, b) = 
Bib, a). 


12.25 Integrate Eq. (12.21) by parts and derive Eq. (12.11). 
12.26 For positive integers n, show that TG _ nrg +n) =(-1)"x 


12.27 Show that 


(a) B(a, oe uae a b)+ Bia,b+1). 
(b) ->)B(a,b). 
(c) B(a, b)Bla +b. oe Bi(b,c)B(a,b+c). 


12.28 Verify that f',(1+1)¢(1 — 1)'dt = 244+! B(a + 1, b+ 1). 


12.29 Show that the volume of the solid formed by the surface z = x“y’, 
the xy-, yz-, and xz-planes, and the plane parallel to the z-axis and going 
through the points (0, yo) and (xo, 0) is 


——_*—_B 1,b+1 
a+b+ ger ys 


12.30 Derive this relation: 


© sinh® x 1 a+l1 b-a 
dx==B 5 where —l1<a<b. 
0 


cosh? x 2 2° 


Hint: Let ¢ = tanh? x in Eq. (12.16). 


12.31 The Hankel function of the second kind is defined as Hankel function of the 
second kind 


1 Jz) az 
HO) = =| een 


where C is the contour shown in Fig. 12.15. 
Find the asymptotic expansion of this function. 


12.32 Find the asymptotic dependence of the modified Bessel function of modified Bessel function 
the first kind, defined as of the first kind 


1 dz 
K(a)= oni f eer z+’ 


where C starts at —oo, approaches the origin and circles it, and goes back to 
—oo. Thus the negative real axis is excluded from the domain of analyticity. 
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Im z 


—iT 
Fig.12.15 The contour for the evaluation of the Hankel function of the second kind 


12.33 Find the asymptotic dependence of the modified Bessel function of 


modified Bessel function fe xecond eae: 


of the second kind 


1 dz 
Poe —(a/2)(z+1/z) 
K,(@) = 2 ke zvtl , 


where C starts at oo, approaches the origin and circles it, and goes back 
to oo. Thus the positive real axis is excluded from the domain of analyticity. 


Part IV 
Differential Equations 


Separation of Variables in Spherical 1 3 


Coordinates 


The laws of physics are almost exclusively written in the form of differential 
equations (DEs). In (point) particle mechanics there is only one independent 
variable, leading to ordinary differential equations (ODEs). In other areas of 
physics in which extended objects such as fields are studied, variations with 
respect to position are also important. Partial derivatives with respect to co- 
ordinate variables show up in the differential equations, which are therefore 
called partial differential equations (PDEs). We list the most common PDEs 
of mathematical physics in the following. 


13.1 PDEs of Mathematical Physics 


In electrostatics, where time-independent scalar fields, such as potentials, 
and vector fields such as electrostatic fields, are studied, the law is described 
by Poisson’s equation, 


V* @(r) = —4rp(r). (13.1) 
In vacuum, where p(r) = 0, Eq. (13.1) reduces to Laplace’s equation, 
V*@(r) =0. (13.2) 


Many electrostatic problems involve conductors held at constant potentials 
and situated in vacuum. In the space between such conducting surfaces, the 
electrostatic potential obeys Eq. (13.2). 

The most simplified version of the heat equation is 


T 
~ =2VT (0), (13.3) 


where T is the temperature and a is a constant characterizing the medium 
in which heat flows. 

One of the most frequently recurring PDEs encountered in mathematical 
physics is the wave equation, 


vy 1e¥ 4 (13.4) 
car : 
S. Hassani, Mathematical Physics, DOI 10.1007/978-3-319-01195-0_13, 395 
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This equation (or its simplification to lower dimensions) is applied to the 
vibration of strings and drums; the propagation of sound in gases, solids, 
and liquids; the propagation of disturbances in plasmas; and the propagation 
of electromagnetic waves. 

The Schrédinger equation, describing nonrelativistic quantum phenom- 
ena, is 

2 
| wave ain”, (13.5) 
2m ot 
where m is the mass of a subatomic particle, fi is Planck’s constant (divided 
by 2x), V is the potential energy of the particle, and |W(r, t)|* is the prob- 
ability density of finding the particle at r at time f. 

A relativistic generalization of the Schrédinger equation for a free parti- 
cle of mass m is the Klein-Gordon equation, which, in terms of the natural 
units (h = 1 =c), reduces to 

2 2, 9° 
V-@-—m C= (13.6) 

Equations (13.3)—(13.6) have partial derivatives with respect to time. As 
a first step toward solving these PDEs and as an introduction to similar tech- 
niques used in the solution of PDEs not involving time,! let us separate 
the time variable. We will denote the functions in all four equations by the 
generic symbol W(r, t). The basic idea is to separate the r and t dependence 
into factors: W (r,t) = R(r)T(t). This factorization permits us to separate 
the two operations of space differentiation and time differentiation. Let $ 
stand for all spatial derivative operators and write all the relevant equa- 
tions either as SW = 9W/dt or as SW = 0°W/dt?. With this notation and 
the above separation, we have 


RdT /dt, 


S(RT) = T(SR) = 
a Rd?T /dt?. 


Dividing both sides by RT, we obtain 
(13.7) 


Now comes the crucial step in the process of separation of variables. The 
LHS of Eq. (13.7) is a function of position alone, and the RHS is a function 
of time alone. Since r and ¢ are independent variables, the only way that 
(13.7) can hold is for both sides to be constant, say a: 


1 
eo => SR=aR 


'See [Hass 08] for a thorough discussion of separation in Cartesian and cylindrical coor- 
dinates. Chapter 19 of this book also contains examples of solutions to some second-order 
linear DEs resulting from such separation. 


13.1 PDEs of Mathematical Physics 


and? 

1 dT i, Hage 1 d°T a #T : 

—— —=aT or =— y= —=aT. 

Td ° di T de” ae 

We have reduced the original time-dependent PDE to an ODE, 

sea T as a T (13.8) 
—_— = r = . 
ao ape 


and a PDE involving only the position variables, (S — a)R = 0. The 
most general form of S — q@ arising from Eqs. (13.3) through (13.6) is 
S—a=V*2+ f(r). Therefore, Eqs. (13.3)—(13.6) are equivalent to (13.8), 
and V7R + f(@)R = 0, which we rewrite as 


V-wir) + f(r (r) =0. (13.9) 


This is called a homogeneous PDE because of the zero on the right-hand 
side. Of all the PDEs outlined above, Poisson’s equation is the only inho- 
mogeneous equation. We will restrict ourselves to the homogeneous case in 
this chapter. 

Depending on the geometry of the problem, Eq. (13.9) is further sepa- 
rated into ODEs each involving a single coordinate of a suitable coordinate 
system. We shall see examples of all major coordinate systems (Cartesian, 
cylindrical, and spherical) in Chap. 19. For the rest of this chapter, we shall 
concentrate on some general aspects of the spherical coordinates. 


Historical Notes 

Jean Le Rond d’Alembert (1717-1783) was the illegitimate son of a famous salon 
hostess of eighteenth-century Paris and a cavalry officer. Abandoned by his mother, 
d’Alembert was raised by a foster family and later educated by the arrangement of his 
father at a nearby church-sponsored school, in which he received instruction in the clas- 
sics and above-average instruction in mathematics. After studying law and medicine, he 
finally chose to pursue a career in mathematics. In the 1740s he joined the ranks of the 
philosophes, a growing group of deistic and materialistic thinkers and writers who ac- 
tively questioned the social and intellectual standards of the day. He traveled little (he 
left France only once, to visit the court of Frederick the Great), preferring instead the 
company of his friends in the salons, among whom he was well known for his wit and 
laughter. 

D’Alembert turned his mathematical and philosophical talents to many of the outstand- 
ing scientific problems of the day, with mixed success. Perhaps his most famous scien- 
tific work, entitled Traité de dynamique, shows his appreciation that a revolution was 
taking place in the science of mechanics—the formalization of the principles stated by 
Newton into a rigorous mathematical framework. The philosophy to which d’ Alembert 
subscribed, however, refused to acknowledge the primacy of a concept as unclear and ar- 
bitrary as “force,” introducing a certain awkwardness to his treatment and perhaps caus- 
ing him to overlook the important principle of conservation of energy. Later, d’ Alembert 
produced a treatise on fluid mechanics (the priority of which is still debated by histori- 
ans), a paper dealing with vibrating strings (in which the wave equation makes its first 
appearance in physics), and a skillful treatment of celestial mechanics. D’ Alembert is 
also credited with use of the first partial differential equation as well as the first solution 


In most cases, a is chosen to be real. In the case of the Schrédinger equation, it is more 
convenient to choose @ to be purely imaginary so that the i in the definition of S can be 
compensated. In all cases, the precise nature of a is determined by boundary conditions. 
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to such an equation using separation of variables. (One should be careful interpreting 
“first”: many of d’Alembert’s predecessors and contemporaries gave similar, though less 
satisfactory, treatments of these milestones.) Perhaps his most well-known contribution 
to mathematics (at least among students) is the ratio test for the convergence of infinite 
series. 

Much of the work for which d’Alembert is remembered occurred outside mathematical 
physics. He was chosen as the science editor of the Encyclopédie, and his lengthy Dis- 
cours Préliminaire in that volume is considered one of the defining documents of the 
Enlightenment. Other works included writings on law, religion, and music. 

Since d’Alembert’s final years were not especially happy ones, perhaps this account of 
his life should end with a glimpse at the humanity his philosophy often gave his work. 
Like many of his contemporaries, he considered the problem of calculating the relative 
risk associated with the new practice of smallpox inoculation, which in rare cases caused 
the disease it was designed to prevent. Although not very successful in the mathematical 
sense, he was careful to point out that the probability of accidental infection, however 
slight or elegantly derived, would be small consolation to a father whose child died from 
the inoculation. It is greatly to his credit that d’ Alembert did not believe such considera- 
tions irrelevant to the problem. 


13.2 Separation of the Angular Part 


With Cartesian and cylindrical variables, the boundary conditions are impor- 
tant in determining the nature of the solutions of the ODE obtained from the 
PDE. In almost all applications, however, the angular part of the spherical 
variables can be separated and studied very generally. This is because the 
angular part of the Laplacian in the spherical coordinate system is closely 
related to the operation of rotation and the angular momentum, which are 
independent of any particular situation. 

The separation of the angular part in spherical coordinates can be done 
in a fashion exactly analogous to the separation of time by writing W as 
a product of three functions, each depending on only one of the variables. 
However, we will follow an approach that is used in quantum mechanical 
treatments of angular momentum. This approach, which is based on the 
operator algebra of Chap. 4 and is extremely powerful and elegant, gives 
solutions for the angular part in closed form. 

Define the vector operator p as p = —iV so that its jth Cartesian compo- 
nent is p; = —i0/dxj;, for j = 1, 2, 3. In quantum mechanics P (multiplied 
by /) is the momentum operator. It is easy to verify that? 


[xj, Pe] =i5jx and [xj,x¢] =0=[P;, Pxl- (13.10) 
We can also define the angular momentum operator as L=fx p. 


This is expressed in components as L; = (rx p); = €ijkXj Px fori = 1, 2, 3, 
where Einstein’s summation convention (summing over repeated indices) is 


3These operators act on the space of functions possessing enough “nice” properties as to 
render the space suitable. The operator x; simply multiplies functions, while p ; differen- 
tiates them. 


13.2 Separation of the Angular Part 


utilized.* Using the commutation relations above, we obtain 
g 
[L;, Li] = i€jxiLy. 


We will see shortly that L can be written solely in terms of the angles 
@ and y. Moreover, there is one factor of p in the definition of L, so if we 
square L, we will get two factors of p, and a Laplacian may emerge in the 
expression for LL. In this manner, we may be able to write V7 in terms of 
L?, which depends only on angles. Let us try this: 


3 
a 22 
L°=L-L= ye L;L; = €1jkX | PKEimnXmPy = €ijkEimnX jPeXmPy 


i=1 
= (5 jmSkn _ 5 jndkm)X j PKXmPp = Xj PRX {PK — Xj PKXKP;- 


We need to write this expression in such a way that factors with the same in- 
dex are next to each other, to give a dot product. We must also try, when pos- 
sible, to keep the p factors to the right so that they can operate on functions 
without intervention from the x factors. We do this by using Eq. (13.10): 


L? = xj(xjPx — 15kj) Pe — (Pex j + HSK EP; 
= XjXjPKPK — EXjPj— PerkXjP; —1XjP; 
= XjXjPKPx — 2iXjPj — KP — 15Kk)Xj{P;- 


Recalling that 644 = ae d= 3 and 4;5 = ae eater = r? etc., 
we can write L? = r?p-p+if- p— (¥- p)(F- p), which, if we make the 
substitution p = —iV, yields 


Var? +r 7r-Vr-V)tr or V. 
Letting both sides act on the function W (r, 0, p), we get 


1 1 1 
VY =-SLW + Sr - Vr VY + Sr VY. (13.11) 
r r r 


But we note that r-V = ré,-V =rd/dr. We thus get the final form of V7W 
in spherical coordinates: 


1 1a / aw low 
Vw = Lew + ( )+ (13.12) 


r : 
ror or r or 


It is important to note that Eq. (13.11) is a general relation that holds in 
all coordinate systems. Although all the manipulations leading to it were 
done in Cartesian coordinates, since it is written in vector notation, there is 
no indication in the final form that it was derived using specific coordinates. 


4It is assumed that the reader is familiar with vector algebra using indices and such objects 
as 6;; and €;;,. For an introductory treatment, sufficient for our present discussion, see 
[Hass 08]. A more advanced treatment of these objects (tensors) can be found in Part VIII 
of the present book. 
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Equation (13.12) is the spherical version of (13.11) and is the version we 
shall use. We will first make the simplifying assumption that in Eq. (13.9), 
the master equation, f(r) is a function of r only. Equation (13.9) then be- 


comes 
1, 1d0/faw\ law 
hw + + + f(rw =0. 


- 
ror or r or 


Assuming, for the time being, that L? depends only on @ and g, and sepa- 
rating W into a product of two functions, W(r, 0,9) = R(r)Y (6, g), we can 
rewrite this equation as 


1, 1a 0 1a 
-—sl (RY) + —-—]r—(RY)|+-—(RY)+ f()RY =0. 
r ror| or ror 


Dividing by RY and multiplying by r? yields 


1 d dR dR 
-SPM)+5 (: )+z +r f(r) =0, 


a Radr dr Radr 
ae = 
or 
L°Y(0, ) =aY (0,9) (13.13) 
and 
d*R 2dR a 
arte +[fo-S]a=0. (13.14) 


We will concentrate on the angular part, Eq. (13.13), leaving the radial part 
to the general discussion of ODEs. The rest of this section will focus on 
showing that L; =L,, Ly =Ly, and L3 = L, are independent of r. 

Since L; is an operator, we can study its action on an arbitrary function /. 
Thus, 


Li f = tei jnxj Vi f = —ieijex jOf/OXx. 


We can express the Cartesian x; in terms of r, 6, and ¢, and use the chain 
rule to express 0f/dx, in terms of spherical coordinates. This will give us 
L; f expressed in terms of r, 0, and q. It will then emerge that 7 is absent in 
the final expression. 

Let us start with 


x =rsin@cosg, y=rsiné sing, Z=rcos6, 


and their inverse, 


1/2 


r=(x?+y*4+27) ee ee tang =~, 
r x 


and express the Cartesian derivatives in terms of spherical coordinates using 
the chain rule. The first such derivative is 


af afar afad af ag 
dx drdx d0dx dp ax’ 


(13.15) 
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The derivative of one coordinate system with respect to the other can be eas- 
ily calculated. For example, dr/dx = x/r = sin@ cos Q, and differentiating 
both sides of the equation cos 6 = z/r, we obtain 
. 00 zor/ox ZX cos 6 sin @ cos g 
sind — = = = 
Ox r2 r3 r 
00 _ cos@cos@ 


a - 


=> 


Finally, differentiating both sides of tang = y/x with respect to x yields 
dg/dx = —sing/(r sin@). Using these expressions in Eq. (13.15), we get 
of if  cos@cosg of sing of 


r) 
— =sin@dcos@ + - é 
Ox or r 00 —srsin@ dg 


In exactly the same way, we obtain 


of : . Of  cos@singdf  cos@ dof 

— =siné sing + - : 
or 00s rsin@ 0g 

of of sind of 

— =cosd— — —, 

Oz or r 06 


We can now calculate L, by letting it act on an arbitrary function and 
expressing all Cartesian coordinates and derivatives in terms of spherical Cartesian components of 


coordinates. The result is angular momentum 
af af 3 3 operator expressed in 
L, f =-—iy— +iz— =i| sing— + cot6 cos p— |] f, i i 
af oe Dy ( 9% on) s spherical coordinates 
or 
L leer + coté ug (13.16) 
- =i sing— + coté cos g— }). ‘ 
x 7 90 7 a0 
Analogous arguments yield 
0 0 0 
L, =i| —cosg— + cotd sing — }, L, =-—i—. 13.17 
y=t ( P 0 g =) cae ( ) 
It is left as a problem for the reader to show that by adding the squares of 
the components of the angular momentum operator, one obtains angular momentum 
squared as differential 
. 1 a/. a [i 0" ; 
L°=—— sind ; (13.18) operator in @ and g 
sin@ 00 a0 sin? 6 dg? 


which is independent of r as promised. Substitution in Eq. (13.12) yields 
the familiar expression for the Laplacian in spherical coordinates. 


13.3. Construction of Eigenvalues of L” 
Now that we have L? in terms of @ and g, we could substitute in Eq. (13.13), 


separate the 6 and g dependence, and solve the corresponding ODEs. How- 
ever, there is a much more elegant way of solving this problem algebraically, 
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because Eq. (13.13) is simply an eigenvalue equation for L’. In this section, 
we will find the eigenvalues of L?. The next section will evaluate the eigen- 
vectors of L?. 

Let us consider L? as an abstract operator and write (13.13) as 


L*|Y) =alY), 


where |Y) is an abstract vector whose (6, g)th component can be calcu- 
lated later. Since L? is a differential operator, it does not have a (finite- 
dimensional) matrix representation. Thus, the determinantal procedure for 
calculating eigenvalues and eigenfunctions will not work here, and we have 
to find another way. 

The equation above specifies an eigenvalue, a, and an eigenvector, |Y). 
There may be more than one |Y) corresponding to the same a. To distin- 
guish among these so-called degenerate eigenvectors, we choose a second 
operator, say L3 € {L;} that commutes with L?. This allows us to select a 
basis in which both L? and L3 are diagonal, or, equivalently, a basis whose 
vectors are simultaneous eigenvectors of both L* and L3. This is possible by 
Theorem 6.4.18 and the fact that both L? and L3 are hermitian operators in 
the space of square-integrable functions. (The proof is left as a problem.) 
In general, we would want to continue adding operators until we obtained a 
maximum set of commuting operators which could label the eigenvectors. 
In this case, L? and L3 exhaust the set.> Using the more common subscripts 
x, y, and z instead of 1, 2, 3 and attaching labels to the eigenvectors, we 
have 


L7|You,g)=alYo,g), belYo,g) = Bl¥a,p)- (13.19) 


The hermiticity of L? and L, implies the reality of a and 8. Next we need to 
determine the possible values for a and £. 

Define two new operators L; =L, +iLy and L_ =L, —iLy. It is then 
easily verified that 


[L?,be]=0,  [b,,baJ=+le, [Ly,L-J=2b,. (13.20) 


The first equation implies that L are invariant operators when acting in the 
subspace corresponding to the eigenvalue a; that is, Li.|Yy,g) are eigenvec- 
tors of L* with the same eigenvalue a: 


L? (Li |Yo,e)) =L+(L?|Yo,g)) =Le(@l¥o,)) = obs ¥o,2)- 


The second equation in (13.20) yields 
L-(Ly-|¥a,6)) = Ueb+)|¥u,p) = (Leb; + Ly)1¥o,g) 
=Lybz|Yo,p) +b4+|Yo,p) = BL+|Yo,p) + b+|¥o,p) 
= (B + 1)L4|Ya,g). 


5We could just as well have chosen L? and any other component as our maximal set. 
However, L? and L; is the universally accepted choice. 


13.3. Construction of Eigenvalues of L? 


This indicates that Ly|Yq,g) has one more unit of the L, eigenvalue than 
|Yq,g) does. In other words, L, raises the eigenvalue of L, by one unit. That 
is why L, is called a raising operator. Similarly, L_ is called a lowering 
operator because L,(L_|Yo,g)) = (B — 1I)L_|Ya,g). 

We can summarize the above discussion as 


L+|Yo,s) = C+|Yo,p+1), 


where C. are constants to be determined by a suitable normalization. 

There are restrictions on (and relations between) a and £. First note that 
as L? is a sum of squares of hermitian operators, it must be a positive oper- 
ator; that is, (a|L?|a) > 0 for all |a). In particular, 


0 < (Ya,plb"|¥a,p) = (Yo, |Yo.p) = oll Yo. pl”- 
Therefore, a > 0. Next, one can readily show that 
LD? =LybL_+Lb?-L,=L_L,+0?+L,. (13.21) 
Sandwiching both sides of the first equality between |Yq,g) and (Yq,g| yields 
(Yu,pIL*|Yo,) = (Yo,p|L+L—|Yo,p) + (Yo,pILZ1¥a,8) — (Ya,lbzl¥o.s), 


with an analogous expression involving L_L;. Using the fact that Ly = 
(L_)*, we get 


a Yop ll? = (Yo,p|b+L_|¥o,g) + B7llYo,ll* — Bll Yo.all? 
= (Yo,p|L-Ly|Yo,g) + B7ll Yoel? + Bll¥a,ll 


= |[Le|Ya,p) |" + B7llYo.all” = Bll Yonall”. (13.22) 


Because of the positivity of norms, this yields a > B* — B anda > B? + B. 
Adding these two inequalities gives 2a > 26* > —/a < B < /@. It fol- 
lows that the values of 8 are bounded. That is, there exist a maximum £, 
denoted by 6, and a minimum #, denoted by 6_, beyond which there are 
no more values of 6. This can happen only if 


L+|Yo.p,) =9, L_|Yo,p_) =9, 


because if L+|Yqg,) are not zero, then they must have values of 6 corre- 
sponding to 64 + 1, which are not allowed. 
Using + for 6 in Eq. (13.22) yields 


(a — BF — B+) II¥o,p, lI? = 0. 


By definition |Yq,g,) 4 0 (otherwise 8, — 1 would be the maximum). Thus, 
we obtain a = Be + 6. An analogous procedure using 6B for B yields 
a = B2 — B_. We solve these two equations for 6, and 6_: 


pr= 5(-1+ VT#aa), p= stl +V1+ 4a). 


angular momentum 
raising and lowering 
operators 
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Since 6, > B_ and /1+4a > 1, we must choose 


1 
b= a +J/1+4a)=-—f_. 

Starting with |Yq,g,), we can apply L_ to it repeatedly. In each step we 
decrease the value of 6 by one unit. There must be a limit to the number 
of vectors obtained in this way, because 6 has a minimum. Therefore, there 
must exist a nonnegative integer k such that 


a) ep KL re = 0. 


Thus, L& |Yu,g,) must be proportional to |Yy,g_). In particular, since 
Legs) has a B value equal to B+ — k, we have B_ = Bi — k. Now, 
using B_ = —B + (derived above) yields the important result 


k 
fe= 5 = J fork eN, 


ora = j(j + 1), since a = B. + B+. This result is important enough to be 
stated as a theorem. 


Theorem 13.3.1 The eigenvectors of L”, denoted by |Y jm), Satisfy the 
eigenvalue relations 


LY jn) =JU+DYjn), ‘Le|¥jn) =| ¥ jm), 


where j is a positive integer or half-integer, and m can take a value 
in the set{—j, -j+1,...,7 —1, j} of2j + 1 numbers. 


Let us briefly consider the normalization of the eigenvectors. We already 
know that the | Yjm), being eigenvectors of the hermitian operators L? andL,, 
are orthogonal. We also demand that they be of unit norm; that is, 


(YjmlY¥ jm’) = 6; j/8mm'- (13.23) 


This will determine the constants C, introduced earlier. Let us consider C+ 
first, which is defined by Ly |¥jm) = C+|¥j,m+1). The hermitian conjugate 
of this equation is (¥jm|L_ = C7 (¥j,m+1|. We contract these two equations 
to get 


(P(E Le | Fin) = CL Op meal eet 


Then we use the second relation in Eq. (13.21), Theorem 13.3.1, and (13.23) 
to obtain 


iG+D-mm+D=IC4? 3 [Co] =VIG+tD—mmt dD. 


Adopting the convention that the argument (phase) of the complex number 
Cx is zero (and therefore that C is real), we get 


Ce=ViGtD—mim+) 


13.3 Construction of Eigenvalues of L? 


Similarly, C_= /7G +l) —m(n— 1). 


Box 13.3.2 The raising and lowering operators act on |Y jm) as fol- 
lows: 


Li l¥ im) = VIG +1) — mnt DIY jm41), 


(13.24) 
L_|¥ jm) =/IG +) —mGn = DIF) m—1)- 


Example 13.3.3 Assume that j =/, a positive integer. Let us find an ex- 
pression for |Yjm) by repeatedly applying L_ to |¥j;). The action for L_ is 
completely described by Eq. (13.24). For the first power of L_, we obtain 


L_|¥n) = Jid + 1) — 10 — DY 1-1) = V 21 Yi 1-1). 


We apply L_ once more: 


(L_)?|¥y) = V2IL_|¥, 1-1) = V2 10 + D — @ — DC — DlYis_2) 


= V21,/2(21 — D|¥),1-2) = V2) Ql = DI¥1-2)- 


Applying L_ a third time yields 


(L_)1¥n) = V22D I — DL_|¥) 1-2) = V22D Ql — DY6C = DI¥i1-3) 


= /3!(21)(21 — 1) (21 — 2)1¥),1-3). 


The pattern suggests the following formula for a general power k: 


L* (Yn) = /kI(QD (21 — 1)--- Ql -—k + 1)|¥p3-x), 


or LE |¥j1) = /EIQD!/Cl — kK)! ¥).1-x). If we set 1 — k =m and solve for 


|Y7,.m), we get 
I ! 
Yim) = of ot pom yyy, 
( — m)\(20! 


The discussion in this section is the standard treatment of angular mo- 
mentum in quantum mechanics. In the context of quantum mechanics, The- 
orem 13.3.1 states the far-reaching physical result that particles can have 
integer or half-integer spin. Such a conclusion is tied to the rotation group 
in three dimensions, which, in turn, is an example of a Lie group, or a con- 
tinuous group of transformations. We shall come back to a study of groups 
later. It is worth noting that it was the study of differential equations that 
led the Norwegian mathematician Sophus Lie to the investigation of their 
symmetries and the development of the beautiful branch of mathematics 
and theoretical physics that bears his name. Thus, the existence of a connec- 
tion between group theory (rotation, angular momentum) and the differential 
equation we are trying to solve should not come as a surprise. 
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13.4 Eigenvectors of L?: Spherical Harmonics 


The treatment in the preceding section took place in an abstract vector space. 
Let us go back to the function space and represent the operators and vectors 
in terms of 6 and ¢. 

First, let us consider L, in the form of a differential operator, as given in 
Eq. (13.17). The eigenvalue equation for L, becomes 


] 
—i1—YVjm(0, 9) =MYjm(0,¢). 
ag jm ( oD) jim ( g) 
We write Yjm(9,9) = Pjm(@)Q jm(@) and substitute in the above equation 
to obtain the ODE for ¢, dQ jm/dg = imQ jm, which has a solution of the 
form Q jm(~) = C jme'”®, where C jm is a constant. Absorbing this constant 
into Pjm, We can write 


Yim (@, ~) = Pim (dei? 


In classical physics the value of functions must be the same at ¢@ as at 
gy + 2x2. This condition restricts the values of m to integers. In quantum 
mechanics, on the other hand, it is the absolute values of functions that are 
physically measurable quantities, and therefore m can also be a half-integer. 


Box 13.4.1 From now on, we shall assume that m is an integer and 
denote the eigenvectors of L* by Yim(@, ~), in which 1 is a nonnegative 
integer. 


Our task is to find an analytic expression for Y),,(0, @). We need differ- 
ential expressions for Li. These can easily be obtained from the expressions 
for L, and L, given in Eqs. (13.16) and (13.17). (The straightforward ma- 
nipulations are left as a problem.) We thus have 


) 0 
L4 = e404 +icoto =), (13.25) 


Since / is the highest value of m, when L+ acts on Y7;(6, g) = Pi (o)el¥ the 
result must be zero. This leads to the differential equation 


0 0 d 
(= + scot [Piel] =0 > (5 - 1eot#) Pu (#) =0. 


The solution to this differential equation is readily found to be 
Py (0) = Cy(sind)’. (13.26) 


The constant is subscripted because each P;; may lead to a different constant 
of integration. We can now write 


Yu (0, v) = C;(sind)e!!?. 


13.4 Eigenvectors of L?: Spherical Harmonics 


With Y7;(@, @) at our disposal, we can obtain any Y/,, (0, @) by repeated 
application of L_. In principle, the result of Example 13.3.3 gives all the 
(abstract) eigenvectors. In practice, however, it is helpful to have a closed 
form (in terms of derivatives) for just the 6 part of Yj(@,@). So, let us 
apply L_, as given in Eq. (13.25) to Yj (0, @): 


9 F) ; 
L_Y) =e lf? (-3 +i cots) [Pu (aei"?] 
F) ; 
=e! \- +i cot (i) [ Pu (aye!'?] 
= ele 2 +1oot0) Pu (6). 
dé 
It can be shown that for a positive integer, 


d Opes 
(5 +n cot) £0) = aren Of (0)]. (13.27) 


Using this result and (13.26) yields 
LY = (—Lel“De_+_ * Tsin! @(C; sin’ 0) | 
7 sin! 6 dé 
d 
= (-)Ci— — = (sin 6). (13.28) 
n 


We apply L_ to (13.28), and use Eq. (13.27) with n =/ — | to obtain 


. 1 d 1 od 
2 2 1-2 - 1-1 ray) | 
L? Yi = (—-1)* Cre’ * a asin ? ap 7 sin 0)| 


=(-1)°C 


i(l—2)@ d 1 d 
- (sin” 6) : 
sin’! @ dd | sin@ dé 


Making the substitution u = cos@ yields 


eil-29 gq ; 
aay gl) 


L? ¥y = Cy 


With a little more effort one can detect a pattern and obtain 


eil-he gk “i 
C1 (1 — v2)? duk (1—u’) ]. 


L‘ yy) = 


If we let k =/ — m and make use of the result obtained in Example 13.3.3, 
we obtain 


(l+m)! eine qi-m ; aul 
(—m)!QD)! ! 0 — u2)"/2 dul #) 


Yim (9, ¢) = 
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To specify Y},,(@, @) completely, we need to evaluate C;. Since C; does not 
depend on m, we set m = 0 in the above expression, obtaining 
1 a’ i 
Yio(u, ) = C 1—u?)’]. 
10(u, g) om ial V4] 


The RHS looks very much like the Legendre polynomial of Chap. 8. In fact, 


Ci 


Yio(u, g) = —1)'2/1)P;(u) = Ai P)(u). 13.29 
10(U, P) opr" ) 1(u) = Aj P)(u) ( ) 
Therefore, the normalization of Yj9 and the Legendre polynomials P; deter- 


mines C;. 
We now use Eq. (7.25) to obtain the integral form of the orthonormality 
relation for Yj: 


20 TU 
‘hint ndiee (tem ( i a [ sind 49/0, 0) (0, ol) Mim 
0 0 


2n 514 
= ay | Yim! (0; 9) Yim (8, 9) sind dd, (13.30) 
0 0 


which in terms of u = cos@ becomes 
20 1 
[ ap | Vii (Us P)Yim(u, p)du = 61 8mm’: (13.31) 
0 -1 


Problem 13.15 shows that using (13.30) one gets Aj = /(21 + 1)/(4z). 
Therefore, Eq. (13.29) yields not only the value of C;, but also the useful 
relation 
21+ 1 
4x 
Substituting the value of C; thus obtained, we finally get 


Yio, g) = Pi(u). (13.32) 


+ beim [C+ m)! 2 -m/2 


par I 
HEN ae Om alt 


qi-m 
dul—™ 


[(i—w?)'], (13.33) 


where u = cos @. These functions, the eigenfunctions of L? and L,, are called 

spherical harmonics spherical harmonics. They occur frequently in those physical applications 
for which the Laplacian is expressed in terms of spherical coordinates. 
One can immediately read off the @ part of the spherical harmonics: 


2A+11 /d+m)! en sel 
1 tw). 
4x QV G—m)\ aaa u°) | 


Pin(u) = (-1)! 


associated Legendre However, this is not the version used in the literature. For historical reasons 
functions the associated Legendre functions P/”(u) are used. These are defined by 


13.4 Eigenvectors of L?: Spherical Harmonics 


‘ii _ x [7 +m)! | 47 
P; (u) = (-1) i—m) matin 


(+m)!d— u2)—m/2 qi-m ani 
G=mi an guia LO) I 


= ( Pais 


Thus, 


Box 13.4.2 The solutions of the angular part of the Laplacian are 


Ne)! 
4x (l+m)! 


1/2 
Yin, 9) = "| P(coso)e'™?, (13.34) 


where, with u = cos0, 


(fe) (leat ee Ge i 
G=m)i an aya =) Ib 
(13.35) 


Be (u) = Cn 


We generated the spherical harmonics starting with Yj;(0, g) and apply- 
ing the lowering operator L_. We could have started with Y; _/ (0, @) instead, 
and applied the raising operator L_. The latter procedure is identical to the 
former; nevertheless, we outline it below because of some important rela- 
tions that emerge along the way. We first note that 


[ +m!) 
i= eet (13.36) 


(This can be obtained following the steps of Example 13.3.3.) Next, we use 
L_|Y),-;) = 0 in differential form to obtain 


d 
(4 - Icot#) P,,-1(@) = 0, 


which has the same form as the differential equation for Pj;. Thus, the solu- 
tion is P),-7(0) = C;(sin@)!, and 


¥)-1(8, 9) = Pi,-1(0)e""? = Cy(sindy'e". 
Applying L, repeatedly yields 


; a1ye ie dk 
Ly Y1,-1(u, @) = C; (—w2)(-O?2 duk 


[a -w)'] 


where u = cos @. Substituting k = / — m and using Eq. (13.36) gives 


| (+m)! (1 te qi-m ct 
Y,—m(u, 9g) = G— milan! al — y2)yn/2 du!—™ [U1 u ) [ 
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The constant C H can be determined as before. In fact, for m = 0 we get 
exactly the same result as before, so we expect C) to be identical to Cj. 
Thus, 


21 - Teme |(L4m)! 


Yi —-m(us 9) = (- 2) ¥ =m)! 


_m/2 d'- m 1 
ss (1—w’) a du!—™ [(l-u*)] 


Comparison with Eq. (13.33) yields 
—m@, 9) = (-1)" Vin O, @), (13.37) 


and using the definition Y;, (0, ¢) = P1,—m (0)e~'"* and the first part of 
Eq. (13.35), we obtain 


(| —m)! 
(+m)! 


Pe Qj=(=1)” Pe), (13.38) 


The first few spherical harmonics with positive m are given below. Those 
with negative m can be obtained using Eq. (13.37). 


21 ,; 
¥ay== a sin (5cos”@ — 1), 
105 ,, 
Y32 = ee ec”? sin? 6 cos6, 
35 3; 
Y33 — a sin? 0 


From Eggs. (13.13), (13.18), and (13.34) and the fact that a =/(/ + 1) for 
some nonnegative integer /, we obtain 


13.4 Eigenvectors of L?: Spherical Harmonics 


1 a 0 1 @ ‘ ; 
A ae (sino = \emem He Sogo | + l(l + 1)Pre'™? 


=0, 


which gives 


i #7 .,.,ar" m? 
smd do (sino 7. ) a5 Pi" +10 + 1)P;" =0. 
As before, we let u = cos @ to obtain 


d nar m2 He 
- (1-7) ro + a ae Pi" =0. (13.39) 


This is called the associated Legendre differential equation. Its solutions, 
the associated Legendre functions, are given in closed form in Eq. (13.35). 
For m = 0, Eq. (13.39) reduces to the Legendre differential equation whose 
solutions, again given by Eq. (13.35) with m = 0, are the Legendre poly- 
nomials encountered in Chap. 8. When m = 0, the spherical harmonics be- 
come g-independent. This corresponds to a physical situation in which there 
is an explicit azimuthal symmetry. In such cases (when it is obvious that the 
physical property in question does not depend on ¢) a Legendre polynomial, 
depending only on cos 8, will multiply the radial function. 


13.4.1 Expansion of Angular Functions 


The orthonormality of spherical harmonics can be utilized to expand func- 
tions of @ and ¢ in terms of them. The fact that these functions are complete 
will be discussed in a general way in the context of Sturm-Liouville systems. 
Assuming completeness for now, we write 


ye) 1 Am Yim(0,~) if Lis not fixed, 
foo=1— _ (13.40) 
em——1 Aim Yim (@, ¢) if / is fixed, 


where we have included the case where it is known a priori that f(@, g) has 
a given fixed / value. To find aj,, we multiply both sides by Yr (0, g) and 


m 
integrate over the solid angle. The result, obtained by using the orthonor- 


mality relation, is 


a | d2f(6, o)Y},(0, 0), (13.41) 


where dS2 = sin@ dé d¢g is the element of solid angle. A useful special case 
of this formula is 


al) = | d2f (0, 9)Y;,0, 9) 


[21+ 1 
= Z|] a2te@.ercos6), (13.42) 


associated Legendre 
differential equation 
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Fig. 13.1 The unit vectors é, and é,’ with their spherical angles and the angle y between 
them 


where we have introduced an extra superscript to emphasize the relation of 
the expansion coefficients with the function being expanded. Another useful 
relation is obtained when we let 6 = 0 in Eq. (13.40): 


20 Linz Aim Yim (6, g)lo=0 if Lis not fixed, 


SO, ~) pa = 1 ; 
Yin dim Yim 4, Y)|6=0 if J is fixed. 


From Eggs. (13.35) and (13.34) one can show that 


21+ 1 
Yim(@, Olas = bm0¥i0(0, g) = dino : 
An 
Therefore, 
6 on atl if / is not fixed, 
£O.9)|p-9 = (13.43) 
gil? / 241 if | is fixed. 


10 4n 


13.4.2 Addition Theorem for Spherical Harmonics 


An important consequence of the expansion in terms of Yj is called the 
addition theorem for spherical harmonics. Consider two unit vectors é, 
and é,, making spherical angles (6, g) and (0’, y’), respectively, as shown in 
Fig. 13.1. Let y be the angle between the two vectors. The addition theorem 
states that 


1 
> vif, (0,0) ¥im @, 9). (13.44) 


m=—l 


An 
21+1 


Pi (cosy) = 


13.5 Problems 


We shall not give a proof of this theorem here and refer the reader to an el- 
egant proof on page 974 which uses the representation theory of groups. The 
addition theorem is particularly useful in the expansion of the frequently oc- 
curring expression 1 /|r—r’|. For definiteness we assume |r’| =r’ < |r| =r. 
Then, introducing t = r’/r, we have 


1 1 1 —1/2 
= = (149 = Ireos 
lr—r’| (2 +r? —2rr'cosy)!/2 = om y) 


Recalling the generating function for Legendre polynomials from Chap. 8 
and using the addition theorem, we get 


l 


aa] =F Lt Mlcosy) = ee aa rea LY in (8, @')¥im O, 9) 


lee) I 
1 r! 
=40 0 a W+1r TI+1 Yi, (6, ~')Yim(O, ¢Q). 
1=0 m=—1 


It is clear that if r <r’, we should expand in terms of the ratio r/r’. It is 
therefore customary to use r— to denote the smaller and r.. to denote the 
larger of the two radii r and r’. Then the above equation is written as 


! rl 
1 
ir—r| any Does, a: ri Yin, (9", 0')Yim (6, Q). (13.45) 


1=0 m=—1 


This equation is used frequently in the study of Coulomb-like potentials. 


13.5 Problems 


13.1 By applying the operator [x ;, p;,] to an arbitrary function f(r), show 
that [x;, P,] = id jk. 


13.2 Use the defining relation L; = €;;,x;p, to show that x;p, — XKPj = 


€ijxL;. In both of these expressions a sum over the repeated indices is un- 
derstood. 


13.3 For the angular momentum operator L; = €;;,x ;P;, Show that the com- 
mutation relation [L;, Lx] = ie ;xiLy holds. 


13.4 Evaluate 0f/dy and df/dz in spherical coordinates and find Ly and L, 
in terms of spherical coordinates. 


13.5 Obtain an expression for L? in terms of 6 and gy, and substitute the 
result in Eq. (13.12) to get the Laplacian in spherical coordinates. 


13.6 Show that L? =L,L_ +L? —L, and L? =L_L, +L? +L. 
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13 Separation of Variables in Spherical Coordinates 


13.7 Show that L’, L,, L,, and L, are hermitian operators in the space of 


square-integrable functions. 


13.8 Verify the following commutation relations: 


[u?, La] =0, (L,, bi] = +L, (Ly, L_]=2L,. 


13.9 Show that L_|Yqg) has 6 — | as its eigenvalue for L,, and that | Yo,g, 


cannot be zero. 


~~ 


13.10 Show that if the |Yjm) are normalized to unity, then with proper 


choice of phase, L_|Yjm) = /j(@ + D — m(m — 1)|¥jm-1)- 


13.11 Derive Eq. (13.36). 


13.12 Starting with L, and Ly, derive the following expression for Li: 


a 
Li =e"? ( +— 4+ I cappe \s 
00 dg 
13.13 Integrate dP /d@ —Icot@ P =0 to find P(@). 


13.14 Verify the following differential identity: 


d 1 d 
(4 +ncot) f(0)=- sin”0 do — |sin” of (0)]. 


13.15 Let /=/' and m =m =0 in Eq. (13.31), and substitute for Yio from 


Eq. (13.29) to obtain A, = JI + 1)/4z. 


13.16 Show that 


sO ae 


13.17 Derive the relations Y; _,,(6, g) = (—1)"Y* Lit (0, g) and 


p-™(@) =(-1)" 2 ™ pm) 
I ~ Gime 
13.18 Show that 
21 = I 


> Yim (@, 9) = =— 


m=—l 


Verify this explicitly for / = 1 and / = 2. 


1 
1d — wy)? auk 1 uw’) |: 


13.5 Problems 


13.19 Show that the addition theorem for spherical harmonics can be writ- 
ten as 


Pi(cos y) = P;(cos 0) P; (cos 0’) 


l 
1—my)! 
+2 dX ' za mt P;" (cos @) P;" (cos 6’) cos[m(g — ¢’)]. 
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Second-Order Linear Differential 1 4 
Equations 


The discussion of Chap. 13 has clearly singled out ODEs, especially those 
of second order, as objects requiring special attention because most com- 
mon PDEs of mathematical physics can be separated into ODEs (of second 
order). This is really an oversimplification of the situation. Many PDEs of 
physics, both at the fundamental theoretical level (as in the general theory 
of relativity) and from a practical standpoint (weather forecast) are nonlin- 
ear, and the method of the separation of variables does not work. Since no 
general analytic solutions for such nonlinear systems have been found, we 
shall confine ourselves to the linear systems, especially those that admit a 
separated solution. 

With the exception of the infinite power series, no systematic method 
of solving DEs existed during the first half of the nineteenth century. The 
majority of solutions were completely ad hoc and obtained by trial and error, 
causing frustration and anxiety among mathematicians. It was to overcome 
this frustration that Sophus Lie, motivated by the newly developed concept 
of group, took up the systematic study of DEs in the second half of the 
nineteenth century. This study not only gave a handle on the disarrayed area 
of DEs, but also gave birth to one of the most beautiful and fundamental 
branches of mathematical physics, Lie group theory. We shall come back to 
a thorough treatment of this theory in Parts VII and IX. 

Our main task in this chapter is to study the second-order linear differen- 
tial equations (SOLDEs). However, to understand SOLDEs, we need some 
basic understanding of differential equations in general. The next section 
outlines some essential properties of general DEs. Section 2 is a very brief 
introduction to first-order DEs, and the remainder of the chapter deals with 
SOLDEs. 


14.1. General Properties of ODEs 


The most general ODE can be expressed as 


ph n 
a(x. ay ey. a) =o, (14.1) 


yx? dx?" dat 
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in which G : R"*+? — R is a real-valued function of n + 2 real variables. 
When G depends explicitly and nontrivially on d” y/dx", Eq. (14.1) is 
called an nth-order ODE. An ODE is said to be linear if the part of the 
function G that includes y and all its derivatives is linear in y. The most 
general nth-order linear ODE is 


n 


Po(x)y + pion aan peje? =q(x) for pr(x) #0, (14.2) 
x dx" 

where {p;}! 9 and q are functions of the independent variable x. Equa- 

tion (14.2) is said to be homogeneous if g = 0; otherwise, it is said to be 

inhomogeneous and g(x) is called the inhomogeneous term. It is customary, 

and convenient, to define a linear differential operator L by! 


n 


Gan? Pala) £0, (14.3) 
x 


d 
L= pole )-b pia) 7 ret Pale) 


and write Eq. (14.2) as 


L[y] =q(). (44) 


A solution of Eq. (14.1) or (14.4) is a single-variable function f : RR R 
such that G(x, f(x), f/(x),..., f(«)) = 0, or LL f] = q(x), for all x in 
the domain of definition of f. The solution of a differential equation may 
not exist if we put too many restrictions on it. For instance, if we demand 
that f : R > R be differentiable too many times, we may not be able to find 
a solution, as the following example shows. 


Example 14.1.1 The most general solution of dy/dx = |x| that vanishes at 
x =Ois 

5x? if x > 0, 

f@=47,,. 

5x" ifx <0. 
This function is continuous and has first derivative f’(x) = |x|, which is 
also continuous at x = 0. However, if we demand that its second derivative 
also be continuous at x = 0, we cannot find a solution, because 


+1 ifx>0, 


PO= fe ew) 


If we want f’” (x) to exist at x = 0, then we have to expand the notion of a 


function to include distributions, or generalized functions. 


Overrestricting a solution for a differential equation results in its absence, 
but underrestricting it allows multiple solutions. To strike a balance between 
these two extremes, we agree to make a solution as many times differen- 
tiable as plausible and to satisfy certain initial conditions. For an nth-order 


'Do not confuse this linear differential operator with the angular momentum (vector) 
operator L. 


14.2 Existence/Uniqueness for First-Order DEs 


DE such initial conditions are commonly equivalent (but not restricted) to 
a specification of the function and of its first n — | derivatives. This sort of 
specification is made feasible by the following theorem. 


Theorem 14.1.2 (Implicit function theorem) Let G :R"+! — R have con- 
tinuous partial derivatives up to the kth order in some neighborhood of a 
point Po = (r1,12,.--,Tn+1) in R"+!. Let (OG/0Xn41)| py AO. Then there 
exists a unique function F : IR” — R that is continuously differentiable k 
times at (some smaller) neighborhood of Po such that 


Xn+1 = F (x1, x2, oa .5Xn) 
for all points P = (x1, X2,.-.,Xn41) ina neighborhood of Po and 
G(s 22 ies F Git WD ven Rnd) =0. 


Theorem 14.1.2 simply asserts that under certain (mild) conditions we 
can “solve” for one of the independent variables in G(x, x2, ...,Xn4+1) =0 
in terms of the others. A proof of this theorem can be found in advanced 
calculus books. 

Application of this theorem to Eq. (14.1) leads to 


d"y dy d*y aly 
a. X,Y, > Qe 1]? 
dx" dx dx dx" 


provided that G satisfies the conditions of the theorem. If we know the so- 
lution y = f (x) and its derivatives up to order n — 1, we can evaluate its nth 
derivative using this equation. In addition, we can calculate the derivatives 
of all orders (assuming they exist) by differentiating this equation. This al- 
lows us to expand the solution in a Taylor series. Thus—for solutions that 
have derivatives of all orders—knowledge of the value of a solution and its 
first n — | derivatives at a point xo determines that solution at a neighboring 
point x. 

We shall not study the general ODE of Eq. (14.1) or even its simpler 
linear version (14.2). We will only briefly study ODEs of the first order in 
the next section, and then concentrate on linear ODEs of the second order 
for the rest of this chapter. 


14.2 Existence/Uniqueness for First-Order DEs 
A general first-order DE (FODE) is of the form G(x, y, y’) = 0. We can 
find y’ (the derivative of y) in terms of a function of x and y if the func- 


tion G(x), x2, x3) is differentiable with respect to its third argument and 
dG/0x3 4 0. In that case we have 


d 
y=7=Frc,y), (14.5) 
dx 
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which is said to be anormal FODE. If F(x, y) is a linear function of y, then 
Eq. (14.5) becomes a first-order linear DE (FOLDE), which can generally 
be written as 


dy 
Pix) + polx)y = q(x). (14.6) 
XxX 


It can be shown that the general FOLDE has an explicit solution: (see 
[Hass 08]) 


Theorem 14.2.1 Any first order linear DE of the form p,(x)y' + po(x)y = 
q(x), in which po, pi, and q are continuous functions in some interval 
(a, b), has a general solution 


1 x 
y= f= c+ | mngcnar|, (14.7) 
(L(x) pi(x) xy 
where C is an arbitrary constant and 


1 * po(t) 
dt}, 14.8 
onal iero | ae 


where xo and x, are arbitrary points in the interval (a, b). 


W(x) = 


No such explicit solution exists for nonlinear first-order DEs. Neverthe- 
less, it is reassuring to know that a solution of such a DE always exists and 
under some mild conditions, this solution is unique. We summarize some 
of the ideas involved in the proof of the existence and uniqueness of the 
solutions to FODEs. (For proofs, see the excellent book by Birkhoff and 
Rota [Birk 78].) We first state an existence theorem due to Peano: 


Theorem 14.2.2 (Peano existence theorem) Jf the function F(x, y) is con- 
tinuous for the points on and within the rectangle defined by |y — c| < K 
and |x — a| < N, and if |F(x, y)| < M there, then the differential equa- 
tion y’ = F(x, y) has at least one solution, y = f (x), defined for |x —a| < 
min(NV, K/M) and satisfying the initial condition f (a) = c. 


This theorem guarantees only the existence of solutions. To ensure 
uniqueness, the function F needs to have some additional properties. An 
important property is stated in the following definition. 


Definition 14.2.3 A function F(x, y) satisfies a Lipschitz condition in a 
domain D C R? if for some finite constant L ( Lipschitz constant), it satisfies 
the inequality 


|F(x, y1) — FQ, y2)| < Llyn — yal 


for all points (x, y,) and (x, y2) in D. 


Theorem 14.2.4 (Uniqueness) Let f(x) and g(x) be any two solutions of 


14.3. General Properties of SOLDEs 


the FODE y' = F(x, y) ina domain D, where F satisfies a Lipschitz con- 
dition with Lipschitz constant L. Then 


| f(x) — g(x)| se’? f @ — g(a). 


In particular, the FODE has at most one solution curve passing through the 
point (a,c) € D. 


The final conclusion of this theorem is an easy consequence of the as- 
sumed differentiability of F and the requirement f(a) = g(a) =c. The 
theorem says that if there is a solution y = f(x) to the DE y’ = F(x, y) 
satisfying f(a) =c, then it is the solution. 

The requirements of the Peano existence theorem are too broad to yield 
solutions that have some nice properties. For instance, the interval of def- 
inition of the solutions may depend on their initial values. The following 
example illustrates this point. 


Example 14.2.5 Consider the DE dy/dx = e”. The general solution of this 
DE can be obtained by direct integration: 


e*dy=dx => -e %=x4+C. 


If y = b when x = 0, then C = —e~?, and 


b 


e*=-x+e" > y=—In(e~? —x). 


b 


Thus, the solution is defined for —co < x < e~”, i.e., the interval of defini- 


tion of a solution changes with its initial value. 


To avoid situations illustrated in the example above, one demands not 
just the continuity of /—as does the Peano existence theorem—but a Lips- 
chitz condition for it. Then one ensures not only the existence, but also the 
uniqueness: 


Theorem 14.2.6 (Local existence and uniqueness) Suppose that the func- 
tion F(x, y) is defined and continuous in the rectangle 


ly sel, |x—al <N 


and satisfies a Lipschitz condition there. Let M = max|F (x, y)| in this 
rectangle. Then the differential equation y' = F(x, y) has a unique solu- 
tion y = f(x) satisfying f(a) = c and defined on the interval |x — a| < 
min(N, K/M). 


14.3 General Properties of SOLDEs 


The most general SOLDE is 


ig d 
Dux) + PUK) + Polwy = P3(X). (14.9) 
x dx 


local existence and 
uniqueness theorem 
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Dividing by p2(x) and writing p for pj/p2, q for po/p2, and r for p3/p2 
reduces this to the normal form 
2 
+ py +4@y =r. (14.10) 
x dx 
Equation (14.10) is equivalent to (14.9) if p2(x) 4 0. The points at which 
p2(x) vanishes are called the singular points of the differential equation. 
There is a crucial difference between the singular points of linear differ- 
ential equations and those of nonlinear differential equations. For a nonlin- 
ear differential equation such as (x? — y)y’ = x? + y?, the curve y = x? 
is the collection of singular points. This makes it impossible to construct 
solutions y = f(x) that are defined on an interval J = [a, b] of the x-axis 
because for any x € J, there is a y for which the differential equation is 
undefined. Linear differential equations do not have this problem, because 
the coefficients of the derivatives are functions of x only. Therefore, all the 
singular “curves” are vertical. Thus, we have the following: 


Definition 14.3.1 The normal form of a SOLDE, Eq. (14.10), is regular 
on an interval [a,b] of the x-axis if p(x), g(x), and r(x) are continuous 
on [a, b]. A solution of a normal SOLDE is a twice-differentiable function 
y = f(x) that satisfies the SOLDE at every point of [a, b]. 


Any function that satisfies (14.10) or (14.9) must necessarily be twice 
differentiable, and that is all that is demanded of the solutions. Any higher- 
order differentiability requirement may be too restrictive, as was pointed out 
in Example 14.1.1. Most solutions to a normal SOLDE, however, automati- 
cally have derivatives of order higher than two. 

We write Eq. (14.9) in the operator form as 


2 


L[ vy] = p3, here L= 
[y= ps, where b= po 


d 
+ Pl . + Po. (14.11) 


d 
It is clear that L is a linear operator because d/dx is linear, as are all powers 
of it. Thus, for constants aw and f, 


Llay,; + By2] = a@L[y1] + BL[y2]. 


In particular, if y; and y2 are two solutions of Eq. (14.11), then Ly; — 
y2] = 0. That is, the difference between any two solutions of a SOLDE is a 
solution of the homogeneous equation obtained by setting p3 = 0.7 

An immediate consequence of the linearity of L is the following: 


Lemma 14.3.2 Jf L[u] = r(x), L[v] = s(x), a and B are constants, and 
w=au+ By, then L[w] =ar(x) + Bs(x). 


The proof of this lemma is trivial, but the result describes the fundamen- 
tal property of linear operators: When r = s = 0, that is, in dealing with 


2This conclusion is, of course, not limited to the SOLDE; it holds for all linear DEs. 


14.3. General Properties of SOLDEs 


homogeneous equations, the lemma says that any linear combination of so- 
lutions of the homogeneous SOLDE (HSOLDE) is also a solution. This is 
called the superposition principle. 

Based on physical intuition, we expect to be able to predict the behav- 
ior of a physical system if we know the differential equation obeyed by that 
system, and, equally importantly, the initial data. Physical intuition also tells 
us that if the initial conditions are changed by an infinitesimal amount, then 
the solutions will be changed infinitesimally. Thus, the solutions of linear 
differential equations are said to be continuous functions of the initial con- 
ditions. 


Remark 14.3.1 Nonlinear differential equations can have completely dif- 
ferent solutions for two initial conditions that are infinitesimally close. 
Since initial conditions cannot be specified with mathematical precision in 
practice, nonlinear differential equations lead to unpredictable solutions, or 
chaos. Chaos was a hot topic in the late 1980s and early 1990s. Some en- 
thusiasts called it the third pillar of modern physics on a par with relativity 
and quantum physics. The enthusiasm has waned, however, because chaos, 
driven entirely by the availability of computers and their superb graphic ca- 
pabilities, has produced absolutely no fundamental results comparable with 
relativity and quantum theory. 


A prediction is not a prediction unless it is unique. This expectation for 
linear equations is borne out in the language of mathematics in the form of 
an existence theorem and a uniqueness theorem. We consider the latter next. 
But first, we need a lemma. 


Lemma 14.3.3 The only solution g(x) of the homogeneous differential 
equation y" + py’ + qy = 0 defined on the interval [a,b] that satisfies 
g(a) =0= g’(a) is the trivial solution g = 0. 


Proof Introduce the nonnegative function u(x) = [ g(x)? + [g’ (x)]? and 
differentiate it to get 


, on 


u' (x) = 2g'g + 29/9" =2¢'(¢ + 9”) =2'(g — pg’ — 48) 


= —2p(g')” +21 —q)ge". 


Since (g + 9’)? > 0, it follows that 2|gg’| < g? + g. Thus, 
2(1 — q)gg’ < 2|(1—4)ga’| =2|(1 —4)||ge"| 
<|d-4@)|(s? +8”) < (1+ Igl)(¢? +8”). 
and therefore, 
u'(x) <|u'(x)| = |-2p9'7 + 20 — gga" | 
<2\plg” + (1+lal)(s* +8”) 
= [1+ |q(@)|]e? + [1+ |a@)| + 2|p@)| Je”. 
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Now let K = 1 + max[|q(x)| + 2|p(x)|], where the maximum is taken over 
[a, b]. Then we obtain 


u'(x) < K(g* + g”) = Ku(x) Vx eél[a, Db]. 


Using the result of Problem 14.1 yields u(x) < u(a)eX°—® for all x € 
[a, b]. This equation, plus u(a) = 0, as well as the fact that u(x) > O im- 
ply that u(x) = g*(x) + g(x) =0. It follows that g(x) = 0 = g(x) for all 
x €[a, b]. 


Theorem 14.3.4 (Uniqueness) If p and q are continuous on [a, b], 
then at most one solution y = f (x) of the DE y" + p(x)y' +q(x)y = 
0 can satisfy the initial conditions f (a) = c, and f'(a) = cz, where 
cy and C2 are arbitrary constants. 


Proof Let f; and f> be two solutions satisfying the given initial conditions. 
Then their difference, g = f\ — fo, satisfies the homogeneous equation [with 
r(x) = 0]. The initial condition that g(x) satisfies is clearly g(a) =0= 
g'(a). By Lemma 14.3.3, g =Oor fi = fo. 


Theorem 14.3.4 can be applied to any homogeneous SOLDE to find the 
latter’s most general solution. In particular, let f)(x) and f2(x) be any two 
solutions of 


y" + p(x)y’ + q(x)y =0 (14.12) 


defined on the interval [a, b]. Assume that the two vectors vj = (fi (a), fi (a)) 
and v2 = (f2(a), fh (a)) in R? are linearly independent.* Let g(x) be an- 
other solution. The vector (g(a), g’(a)) can be written as a linear combina- 
tion of vj and v2, giving the two equations 


g(a) =c1 fi (a) +2 fo(a), 
g (a)=afi@) +efz(a). 


Now consider the function u(x) = g(x) — cq fi (x) — c2 fo(x), which satisfies 
Eq. (14.12) and the initial conditions u(a) = u’(a) = 0. By Lemma 14.3.3, 
we must have u(x) = 0 or g(x) =c1 fi (x) + c2fo(x). We have proved the 
following: 


Theorem 14.3.5 Let f; and fz be two solutions of the HYSOLDE 


y” + py’ +qy =0, 


31f they are not, then one must choose a different initial point for the interval. 


14.4 The Wronskian 


where p and q are continuous functions defined on the interval [a, b]. If 


(fila), ff@)) and (fa), A@) 


are linearly independent vectors in R*, then every solution g(x) of this 
HSOLDE is equal to some linear combination g(x) = c1 fi(x) + c2 fo(x) 
of fi and f2 with constant coefficients c, and c2. 


14.4 The Wronskian 


The two solutions f;(x) and f2(x) in Theorem 14.3.5 have the property that 
any other solution g(x) can be expressed as a linear combination of them. 
We call f| and fo a basis of solutions of the HSOLDE. To form a basis of 
solutions, f; and f must be linearly independent.* 


Definition 14.4.1 The Wronskian of any two differentiable functions 


fi(x) and fo(x) is 


fil) pay 


W(fi, (25x) = fi) fx) — Pp) f{@) = det Ge Fix) 


Proposition 14.4.2 The Wronskian of any two solutions of Eq. (14.12) sat- 
isfies 
W(fi, fas x) = Wf, fri ete p(tydt | 


where c is any number in the interval [a, b]. 


Proof Differentiating both sides of the definition of Wronskian and substi- 
tuting from Eq. (14.12) yields a FOLDE for W(/i, fo; x), which can be 
easily solved. The details are left as a problem. 


An important consequence of Proposition 14.4.2 is that the Wronskian of 
any two solutions of Eq. (14.12) does not change sign in [a, b]. In particular, 
if the Wronskian vanishes at one point in [a, b], it vanishes at all points in 
[a, b]. 

The real importance of the Wronskian is contained in the following the- 
orem, whose straightforward proof is left as an exercise for the reader. 


Theorem 14.4.3 Two differentiable functions f, and f2, which are nonzero 
in the interval [a, b], are linearly dependent if and only if their Wronskian 
vanishes. 


Historical Notes 
Josef Hoéné de Wronski (1778-1853) was born Josef Hoéné, but he adopted the name 
Wronski around 1810 just after he married. He had moved to France and become a French 


“The linear dependence or independence of a number of functions { f;}?_, : [a,b] > Ris 
a concept that must hold for all x € [a, 5]. 


basis of solutions 


Wronskian defined 
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citizen in 1800 and moved to Paris in 1810, the same year he published his first memoir on 
the foundations of mathematics, which received less than favorable reviews from Lacroix 
and Lagrange. His other interests included the design of caterpillar vehicles to compete 
with the railways. However, they were never manufactured. 

Wronski was interested mainly in applying philosophy to mathematics, the philosophy 
taking precedence over rigorous mathematical proofs. He criticised Lagrange’s use of 
infinite series and introduced his own ideas for series expansions of a function. The coef- 
ficients in this series are determinants now known as Wronskians [so named by Thomas 
Muir (1844-1934), a Glasgow High School science master who became an authority on 
determinants by devoting most of his life to writing a five-volume treatise on the history 
of determinants]. 

For many years Wronski’s work was dismissed as rubbish. However, a closer examination 
of the work in more recent times shows that although some is wrong and he has an in- 
credibly high opinion of himself and his ideas, there are also some mathematical insights 
of great depth and brilliance hidden within the papers. 


14.4.1 A Second Solution to the HSOLDE 


If we know one solution to Eq. (14.12), say f;, then by differentiating both 
sides of 


fie) x) — AO) fe) = W(x) = Woje Se PO, 


dividing the result by ie and noting that the LHS will be the derivative of 
to/fi, we can solve for fo in terms of f;. The result is 


m 1 AY 
fin=fiofe+K f RO exo] - f pcyat)as}, 


where K = W(c) is another arbitrary (nonzero) constant; we do not have 
to know W(x) (this would require knowledge of f>, which we are trying 
to calculate!) to obtain W(c). In fact, the reader is urged to check directly 
that fo(x) satisfies the DE of (14.12) for arbitrary C and K. Whenever 
possible—and convenient—it is customary to set C = 0, because its pres- 
ence simply gives a term that is proportional to the known solution f) (x). 


Theorem 14.4.4 Let f; be a solution of y" + p(x)y'’ +q(x)y =0. 


Then 
x, 1 S 
fas) = fies) f pol f ponat] ds, 


is another solution and { f\, f2} forms a basis of solutions of the DE. 


Example 14.4.5 Here are some examples of finding the second solution 
from the first: 


14.4 The Wronskian 


(a) A solution to the SOLDE y" — k?y = 0 is e**. To find a second solu- 
tion, we let C = 0 and K = 1 in Theorem 14.4.4. Since p(x) = 0, we 


have 
* ds 1 e7 2ka 
kx —kx kx 
= O+ =—-— + ; 
A@)=e ( i; e2ks ) 2k 2k i 


which, ignoring the second term (which is proportional to the first 
solution), leads directly to the choice of e—** as a second solution. 

(b) The differential equation y” + k*y = 0 has sinkx as a solution. With 
C =0,a =7/(2k), and K = 1, we get 


x d. 
fo(x) = sinkx (0+ f = ) = —sinkx cotks | io, =—coskx. 
m/2k sin* ks 


(c) For the solutions in part (a), 


kx kx 
e ke 
W (x) = det (oe ae = —2k, 


and for those in part (b), 


sinkx  kcoskx 
ee (ee —k ae oh 
Both Wronskians are constant. In general, the Wronskian of any two 


linearly independent solutions of y’” + g(x)y = 0 is constant. 


Most special functions used in mathematical physics are solutions of 
SOLDEs. The behavior of these functions at certain special points is deter- 
mined by the physics of the particular problem. In most situations physical 
expectation leads to a preference for one particular solution over the other. 
For example, although there are two linearly independent solutions to the 
Legendre DE 

d ny dy 

ale —x FA +n(n+ 1)y=0, 
the solution that is most frequently encountered is the Legendre polynomial 
P,(x) discussed in Chap. 8. The other solution can be obtained by solving 
the Legendre equation or by using Theorem 14.4.4, as done in the following 
example. 


Example 14.4.6 The Legendre equation can be reexpressed as 


d*y 2x dy n(n+1) 


=0. 
dx? 1—x?2 dx i-22 
This is an HSOLDE with 
2x n(n + 1) 
eget 
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One solution of this HSOLDE is the well-known Legendre polynomial 
P,, (x). Using this as our input and employing Theorem 14.4.4, we can gen- 
erate another set of solutions. 

Let Q,,(x) stand for the linearly independent “partner” of P,, (x). Then 


<2 
Onts)= Puls) [Ga exe| [made as 


1 
= Pats) moles a [d= aura [ d= SBI 


where A, is an arbitrary constant determined by standardization, and a is 

an arbitrary point in the interval [—1, +1]. For instance, for n = 0, we have 
Po = 1, and we obtain 

Ona i, * ds Pt 1 1 1 

— — n ci 

ol 0 <2 0 5 

The standard form of Qo(x) is obtained by setting Ag = 1 anda =0: 


l+a 
l-a 


1+x 
1— 


n 


Bo Ps 


1+x 
Qox) = 51m i- for |x| <1. 
Similarly, since P}(x) =x, 
if ds 1l+x 
SE a soy = AX t+ Bxln +C for |x| <1. 
s2(1 — 57) 1-—x 


Here standardization is A= 0, B = 5. and C = —1. Thus, 


1+.x 
1— 


1 
010) = 5*In 


14.4.2 The General Solution to an ISOLDE 


Inhomogeneous SOLDEs (ISOLDEs) can be most elegantly discussed in 

terms of Green’s functions, the subject of Chap. 20, which automatically 

incorporate the boundary conditions. However, the most general solution of 

an ISOLDE, with no boundary specification, can be discussed at this point. 
Let g(x) be a particular solution of 


L[y]=y"+ py’ +qy =r(x) (14.13) 


and let h(x) be any other solution of this equation. Then h(x) — g(x) satisfies 
Eq. (14.12) and can be written as a linear combination of a basis of solutions 
fi(x) and f(x), leading to the following equation: 


A(x) =c1 fie) + c2 f(x) + g(x). (14.14) 


14.4 The Wronskian 


Thus, if we have a particular solution of the ISOLDE of Eq. (14.13) and two 
basis solutions of the HSOLDE, then the most general solution of (14.13) 
can be expressed as the sum of a linear combination of the two basis solu- 
tions and the particular solution. 

We know how to find a second solution to the HSOLDE once we know 
one solution. We now show that knowing one such solution will also allow 
us to find a particular solution to the ISOLDE. The method we use is called 
the variation of constants. This method can also be used to find a second 
solution to the HSOLDE. 

Let f; and f2 be the two (known) solutions of the HSOLDE and g(x) 
the sought-after solution to Eq. (14.13). Write g as g(x) = fi(x)u(x) and 
substitute it in (14.13) to get a SOLDE for v(x): 


” tt) y a 
: +(o+ fi ft 


This is a first order linear DE in v’, which has a solution of the form 


Femdcs) fc * flor) a} 
ROL ta Wo) 


where W(x) is the (known) Wronskian of Eq. (14.13). Substituting 


Wee) AWA) - ACA) _ d ( f) 
fi @) ices fi 


in the above expression for v’ and setting C = 0 (we are interested in a 
particular solution), we get 


dv “(4 :) , AMr® | 
dx W(t) 


=| 5 ‘har frp) d f* fi@r@ 
fi@) Ja Wt) 


Adie. We) 


=filx)rx)/W) 


and 


fox) f[* AOMro * fart) 


VO) wo dt | WO dt. 
This leads to the particular solution 
8) = fi)vQ) = fo) : OO at — fils) : nO at. 
(14.15) 


We just proved the following result: 
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method of variation of 
constants 
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Si) > 0 


SQ) < 0 


Fig. 14.1 If £341) >0> f3(x2), then (assuming that the Wronskian is positive) 
fii) > O> filx2) 


Proposition 14.4.7 Given a single solution f\(x) of the HSOLDE 
corresponding to an ISOLDE, one can use Theorem 14.4.4 to find 
a second solution f2(x) of the HSOLDE and Eq. (14.15) to find a 
particular solution g(x)of the ISOLDE. The most general solution h 
will then be 


h(x) =c1 fix) + c2 fa(x) + g(x). 


14.4.3 Separation and Comparison Theorems 


The Wronskian can be used to derive some properties of the graphs of solu- 
tions of HSOLDEs. One such property concerns the relative position of the 
zeros of two linearly independent solutions of an HSOLDE. 


Theorem 14.4.8 (Separation) The zeros of two linearly independent solu- 
tions of an HSOLDE occur alternately. 


Proof Let f\(x) and f2(x) be two independent solutions of Eq. (14.12). We 
have to show that a zero of f; exists between any two zeros of fo. The linear 
independence of f; and f2 implies that W(f1, fo; x) 4 0 for any x € [a, b]. 
Let x; € [a, b] be a zero of fo. Then 


OA WA, f23 x1) = AD Ai) — AOD AG) = AG) Ai). 


Thus, fi (x;) 4 0 and f5 (xi) 4 0). Suppose that x; and x2—where x2 > x1 — 
are two successive zeros of f2. Since f2 is continuous in [a, b] and f5(x1) 4 
0, f2 has to be either increasing [ f3(%1) > 0] or decreasing [ f, 1) < 0] 
at x. For f2 to be zero at x2, the next point, f3 (x2) must have the oppo- 
site sign from f5(x1) (see Fig. 14.1). We proved earlier that the sign of the 
Wronskian does not change in [a, b] (see Proposition 14.4.2 and comments 
after it). The above equation then says that f1(x1) and f1 (x2) also have op- 
posite signs. The continuity of | then implies that f| must cross the x-axis 
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somewhere between x; and x2. A similar argument shows that there exists 
one zero of f2 between any two zeros of f). 


Example 14.4.9 Two linearly independent solutions of y” + y = 0 are sinx 
and cos x. The separation theorem suggests that the zeros of sinx and cos x 
must alternate, a fact known from elementary trigonometry: The zeros of 
cos x occur at odd multiples of 7/2, and those of sinx occur at even multi- 
ples of m/2. 


A second useful result is known as the comparison theorem (for a proof, 
see [Birk 78, p. 38]). 


Theorem 14.4.10 (Comparison) Let f and g be nontrivial solutions of the comparison theorem 
u” + p(x)u =0 and v" + q(x)v = 0, respectively, where p(x) > q(x) for 

all x € [a,b]. Then f vanishes at least once between any two zeros of g, 

unless p =q and f is a constant multiple of g. 


The form of the differential equations used in the comparison theorem is 
not restrictive because any HSOLDE can be cast in this form, as the follow- 
ing proposition shows. 


Proposition 14.4.11 If y” + p(x)y’ + q(x)y =0, then 


u= vero] 5 [ poodt] 


satisfies u” + S(x)u =0, where S(x) =q — iP? a 5p. 

Proof Define w(x) by y = wu, and substitute in the HSOLDE to obtain 
(u'w + w'u)' + p(u'w + w'u) + quw =0, 

or 


wu" + (2w’ + pw)u! + (qw+ pw’ +w")u=0. (14.16) 


If we demand that the coefficient of u’ be zero, we obtain the DE 2w’ + 
pw = 0, whose solution is 


w(x) = Cexp| —5 [ pit) ar| 


Dividing (14.16) by this w and substituting for w yields 


ii wow” Il 1 
u’+S(x)u=0, where S(x)=q+p—+ =q p 
Ww w 4 


A useful special case of the comparison theorem is given as the following 
corollary whose straightforward but instructive proof is left as a problem. 
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Corollary 14.4.12 If g(x) <0 forall x € [a, b], then no nontrivial solution 
of the differential equation v" + q(x)v =0 can have more than one zero. 


Example 14.4.13 It should be clear from the preceding discussion that the 
oscillations of the solutions of v’ + q(x)v = 0 are mostly determined by 
the sign and magnitude of q(x). For g(x) < 0 there is no oscillation; that 
is, there is no solution that changes sign more than once. Now suppose that 
q(x) => k? > 0 for some real k. Then, by Theorem 14.4.10, any solution of 
v’ + q(x)v = 0 must have at least one zero between any two successive 
zeros of the solution sinkx of u” + k*u = 0. This means that any solution 
of v"’ + q(x)v = 0 has a zero in any interval of length 2/k if q(x) > k? > 0. 
Let us apply this to the Bessel DE, 


” 1 / n? 
y+-yt+\|l-s]y=0. 
x x 


By Proposition 14.4.11, we can eliminate the y’ term by substituting v/./x 
for y.> This transforms the Bessel DE into 


We compare this, for n = 0, with uv” + u = 0, which has a solution u = sinx, 
and conclude that each interval of length z of the positive x-axis contains at 
least one zero of any solution of order zero (n = 0) of the Bessel equation. 
Thus, in particular, the zeroth Bessel function, denoted by Jo(x), has a zero 
in each interval of length z of the x-axis. 

On the other hand, for 4n? — 1 > 0, orn > 5 we have 1 > [1 — (4n? — 
1) /4x7]. This implies that sinx has at least one zero between any two suc- 
cessive zeros of the Bessel functions of order greater than 5 It follows that 
such a Bessel function can have at most one zero between any two succes- 
sive zeros of sinx (or in each interval of length z on the positive x-axis). 


Example 14.4.14 Let us apply Corollary 14.4.12 to v” — v = 0 in which 
q(x) = —1 <0. According to the corollary, the most general solution, 
cje* +cz2e *, can have at most one zero. Indeed, 


1 
qetoe*=0 > x= zin 


c2 


cl 


’ 


and this (real) x (if it exists) is the only possible solution, as predicted by 
the corollary. 


Because of the square root in the denominator, the range of x will have to be restricted 
to positive values. 


14.5 Adjoint Differential Operators 
14.5 Adjoint Differential Operators 


We discussed adjoint operators in detail in the context of finite-dimensional 
vector spaces in Chap. 4. In particular, the importance of self-adjoint, or 
hermitian, operators was clearly spelled out by the spectral decomposition 
theorem of Chap. 6. A consequence of that theorem is the completeness of 
the eigenvectors of a hermitian operator, the fact that an arbitrary vector can 
be expressed as a linear combination of the (orthonormal) eigenvectors of a 
hermitian operator. 

Self-adjoint differential operators are equally important because their 
“eigenfunctions” also form complete orthogonal sets, as we shall see later. 
This section generalizes the concept of the adjoint to the case of a differen- 
tial operator (of second degree). 


Definition 14.5.1 The HSOLDE 


Lly] = po(x)y" + pi(x)y’ + po(x)y =0 (14.17) 


is said to be exact if 


d 
Ll f] = pox) f" + pie) f’ + pox) f = laws’ + Bx) f] (14.18) 


for all f € C?[a, b] and for some A, B € €![a, b]. An integrating factor 
for L[y] is a function jz(x) such that (x)L[y] is exact. 


If the HSOLDE (14.17) is exact, then Eq. (14.18) gives 


d 
qlAWy + B@)y]=0 = AGy'+ BOY =C, 
a FOLDE with a constant inhomogeneous term. 
If (14.17) has an integrating factor, then even the ISOLDE corresponding 
to it can be solved, because 


d 
MX)LLy]=pO)r(x) => ql Aw" + B(x)y] = w@yr(a) 


x 


=> A(x)y' + Bax)y= / u(t)r(t) dt, 
a 
which is a general FOLDE. Thus, the existence of an integrating factor com- 
pletely solves a SOLDE. It is therefore important to know whether or not a 
SOLDE admits an integrating factor. First let us give a criterion for the ex- 
actness of a SOLDE. 


Proposition 14.5.2 The HSOLDE of Eq. (14.17) is exact if and only if py — 
Pi + po=0. 


Proof Tf the HSOLDE is exact, then Eq. (14.18) holds for all f, implying 
that p2 = A, pj = A’ + B, and po = B’. It follows that py = A”, p| = 
A” + B’, and po = B’, which in turn give py — p| + po = 0. 


exact HSOLDE 


integrating factor for 
HSOLDE 


433 


434 


adjoint of a 
second-order linear 
differential operator 


14 Second-Order Linear Differential Equations 


Conversely if p — p| + po = 0, then, substituting pp = —p4 + p} in the 
LHS of Eq. (14.17), we obtain 


poy” + piy’ + poy = poy" + pry’ + (—p3 + pi)y 


= poy" — phy + (py) = (poy’ — phy)’ + (piyy’ 


d 
= Fy (Poy — phy + pry), 


and the DE is exact. 


A general HSOLDE is clearly not exact. Can we make it exact by mul- 
tiplying it by an integrating factor? The following proposition contains the 
answet. 


Proposition 14.5.3 A function ju is an integrating factor of the HSOLDE of 
Eq. (14.17) if and only if it is a solution of the HSOLDE 


M[] = (p24)” — (pie)’ + pow =0. (14.19) 


Proof This is an immediate consequence of Proposition 14.5.2. 


We can expand Eq. (14.19) to obtain the equivalent equation 
Pott” + (2p) — pi)m’ + (pi — Pi + po)w =0. (14.20) 
The operator M given by 
2 


d 
M = pos + (27) Pi) 


< + (p3 — p{ + po) (14.21) 
is called the adjoint of the operator L and denoted by M = LY. The reason 
for the use of the word “adjoint” will be made clear below. 

Proposition 14.5.3 confirms the existence of an integrating factor. How- 
ever, the latter can be obtained only by solving Eq. (14.20), which is at least 
as difficult as solving the original differential equation! In contrast, the in- 
tegrating factor for a FOLDE can be obtained by a mere integration [ju(x) 
given in Eq. (14.8) is an integrating factor of the FOLDE (14.6), as the reader 
can verify]. 

Although integrating factors for SOLDEs are not as useful as their coun- 
terparts for FOLDEs, they can facilitate the study of SOLDEs. Let us first 
note that the adjoint of the adjoint of a differential operator is the original 
operator: (L*)* =L (see Problem 14.13). This suggests that if v is an inte- 
grating factor of L[v], then wu will be an integrating factor of M[v] = Li fv]. 
In particular, multiplying the first one by v and the second one by u 
and subtracting the results, we obtain [see Equations (14.17) and (14.19)] 
vL[u] — uM[v] = (vp2)u” — u(p2v)” + (vp)u’ + u(pyv)’, which can be 
simplified to 


d 
vL[u] — uM[v] = 7, Lp2ue’ — (pov) ut piuv]. (14.22) 


14.5 Adjoint Differential Operators 


Integrating this from a to b yields 


b 
i (vL[u] — uM[v]}) dx = [ p2vu' —(pov)'ut piu]? (14.23) 


Equations (14.22) and (14.23) are called the Lagrange identities. Equation 
(14.23) embodies the reason for calling M the adjoint of L: If we consider 
u and v as abstract vectors |u) and |v), L and M as operators in a Hilbert 
space with the inner product (u|v) = is u*(x)v(x) dx, then Eq. (14.23) can 
be written as 


(v|Llw) — (u|M|v) = (u|L"|v)* — (u|M|v) = [p2vu! — (p2v)'u + piuv]|?. 


If the RHS is zero, then (u|L*|v)* = (u|M|v) for all |), |v), and since all 
these operators and functions are real, L’ = M. 

As in the case of finite-dimensional vector spaces, a self-adjoint differ- 
ential operator merits special consideration. For M[v] = L'[v] to be equal 
to L, we must have [see Eqs. (14.17) and (14.20)] 2p5 — p, = p; and 
Ps — p, + po = po. The first equation gives p = p1, which also solves 
the second equation. If this condition holds, then we can write Eq. (14.17) 
as L[y] = poy” + phy’ + poy, or 


d dy 
L[y] = ae | pace + po(x)y =0. 


Can we make all SOLDEs self-adjoint? Let us multiply both sides of 
Eq. (14.17) by a function h(x), to be determined later. We get the new DE 


A(x) pr(x)y" + A(x) pi(x)y’ + h(x) pox) y = 0, 


which we desire to be self-adjoint. This will be accomplished if we choose 
h(x) such that hp; = (hp2)’, or poh’ +h(p), — p1) = 0, which can be readily 


integrated to give 
1 * pi(t 
h(x) = ~exo| | Pils) ar| 
P2 p2it) 


We have just proved the following: 


Lagrange identities 
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Theorem 14.5.4 The SOLDE of Eq. (14.17) is self-adjoint if and only if all SOLDEs can be made 


P5 = pi, in which case the DE has the form 


d d 
a | pac + po(x)y =0. 


If it is not self-adjoint, it can be made so by multiplying it through by 


1 ~ pi(t) 
h =— dt |. 
= P2 exo | p2(t) | 


self-adjoint 
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Example 14.5.5 (a) The Legendre equation in normal form, 


i 2x me x 
ar ae ee 


y y=0, 
is not self-adjoint. However, if we multiply through by h(x) = 1 — x7, we 
get 


(1- x yy" — 2xy'+Ay =0, 


which can be rewritten as [(1 — x7) y’]/ + Ay = 0, which is self-adjoint. 
(b) Similarly, the normal form of the Bessel equation 


1 / n 
yoo A Ly a0 
Xx Xx 


is not self-adjoint, but multiplying through by h(x) = x yields 


which is clearly self-adjoint. 


14.6 Power-Series Solutions of SOLDEs 


Analysis is one of the richest branches of mathematics, focusing on the end- 
less variety of objects we call functions. The simplest kind of function is 
a polynomial, which is obtained by performing the simple algebraic opera- 
tions of addition and multiplication on the independent variable x. The next 
in complexity are the trigonometric functions, which are obtained by taking 
ratios of geometric objects. If we demand a simplistic, intuitive approach 
to functions, the list ends there. It was only with the advent of derivatives, 
integrals, and differential equations that a vastly rich variety of functions 
exploded into existence in the eighteenth and nineteenth centuries. For in- 
stance, e*, nonexistent before the invention of calculus, can be thought of as 
the function that solves dy/dx = y. 

Although the definition of a function in terms of DEs and integrals seems 
a bit artificial, for most applications it is the only way to define a function. 
For instance, the error function, used in statistics, is defined as 


x 


Such a function cannot be expressed in terms of elementary functions. Sim- 
ilarly, functions (of x) such as 


OO as m/2 m/2 
sint 7 dt 
fa f vimesintrar, — f —S_., 
xt 0 0 V1l—x2sin*t 
and so on are encountered frequently in applications. None of these func- 
tions can be expressed in terms of other well-known functions. 


14.6 Power-Series Solutions of SOLDEs 


An effective way of studying such functions is to study the differen- 
tial equations they satisfy. In fact, the majority of functions encountered 
in mathematical physics obey the HSOLDE of Eq. (14.17) in which the 
pi(x) are elementary functions, mostly ratios of polynomials (of degree at 
most 2). Of course, to specify functions completely, appropriate boundary 
conditions are necessary. For instance, the error function mentioned above 
satisfies the HSOLDE y” + 2xy’ = 0 with the boundary conditions y(0) = 5 
and y’(0) = 1/,/z. 

The natural tendency to resist the idea of a function as a solution of a 
SOLDE is mostly due to the abstract nature of differential equations. After 
all, it is easier to imagine constructing functions by simple multiplications 
or with simple geometric figures that have been around for centuries. The 
following beautiful example (see [Birk 78, pp. 85-87]) should overcome 
this resistance and convince the skeptic that differential equations contain 
all the information about a function. 


Example 14.6.1 (Trigonometric functions as solutions of DEs) We can 
show that the solutions to y” + y = 0 have all the properties we expect 
of sinx and cosx. Let us denote the two linearly independent solutions of 
this equation by C(x) and S(x). To specify these functions completely, we 
set C(0) = S’(0) = 1, and C’(0) = S(O) = 0. We claim that this information 
is enough to identify C(x) and S(x) as cosx and sin x, respectively. 

First, let us show that the solutions exist and are well-behaved func- 
tions. With C(0) and C’(0) given, the equation y” + y = 0 can generate all 
derivatives of C(x) at zero: C”(0) = —C(0) = —1, C’”(0) = —C’(0) = 0, 
Cc (0) = —C” (0) = +1, and, in general, 


0 if n is odd, 


Coy = a 
(-l)* ifn=2k wherek=0,1,2,.... 


Thus, the Taylor expansion of C(x) is 


oo 2k 
C@)= zen onl (14.24) 
Similarly, 
0° 2k+1 
_ yk 2% 
Sx) = 0-1) CEI, (14.25) 


k=0 


A simple ratio test on the series representation of C(x) yields 


aed (HIDE 2+) (2k + 2)! 

lim —— = lim 

k>oo ak k-> oo (—1)kx2k /(2k)! 
—x2 


= Gig i, 
k-s00 Qk +2)2k +1) 


which shows that the series for C(x) converges for all values of x. Similarly, 
the series for S(x) is also convergent. Thus, we are dealing with well-defined 
finite-valued functions. 
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Let us now enumerate and prove some properties of C(x) and S(x). 


C'(x) =—S(x). 

We prove this relation by differentiating C”(x) + C(x) = 0 and 
writing the result as [C’(x)]” + C’(x) = 0 to make evident the fact that 
C’(x) is also a solution. Since C’(0) = 0 and [C’(0)]/ = C’” (0) = —1, 
and since —S(x) satisfies the same initial conditions, the uniqueness 
theorem implies that C’(x) = —S(x). Similarly, S’(x) = C(x). 

C(x) + S2(x) = 1. 

Since the p(x) term is absent from the SOLDE, Proposition 14.4.2 
implies that the Wronskian of C(x) and S(x) is constant. On the other 
hand, 


W(C, S; x) = C(x)S'(x) — C(x) S(a) = C°(a) +S?) 
= W(C, S;0) =C*(0) + S70) = 1. 
S(a +x) = S(a)C(x) + C(a)S(x). 
The use of the chain rule easily shows that S(a + x) is a solution of 
the equation y” + y = 0. Thus, it can be written as a linear combina- 


tion of C(x) and S(x) [which are linearly independent because their 
Wronskian is nonzero by (b)]: 


S(a+x) = AS(x)+ BC(x). (14.26) 


This is a functional identity, which for x = 0 gives S(a) = BC(0) = B. 
If we differentiate both sides of Eq. (14.26), we get 


C(a+x)=AS'(x) + BC’ (x) = AC(x) — BS(x), 


which for x = 0 gives C(a) = A. Substituting the values of A and B 
in Eq. (14.26) yields the desired identity. A similar argument leads to 


Cla +x) = C(a)C(x) — S(a)S(x). 


Periodicity of C(x) and S(x). 

Let xo be the smallest positive real number such that S(xo) = 
C(xo). Then property (b) implies that C(xo) = S(xo) = 1/ /2. On the 
other hand, 


S(x0 +x) = SQ0)CH) + C0) SX) = C0)C™) + S(x0) SX) 
= C(x0)C (x) — S(xo) S(—x) = C(xo — x). 

The third equality follows because by Eq. (14.25), S(x) is an odd 
function of x. This is true for all x; in particular, for x = x it yields 
S(2x9) = C(O) = 1, and by property (b), C(2xo) = 0. Using property 
(c) once more, we get 

S(2x9 + x) = S(2x9)C(x) + C(2x0)S(x) = C(x), 

C(2x9 + x) = C(2x9)C (x) — S(2x0)S(x) = —S(x). 


14.6 Power-Series Solutions of SOLDEs 


Substituting x = 2x9 yields S(4x9) = C(2x9) = 0 and C(4x0) = 
—S$(2x9) = —1. Continuing in this manner, we can easily obtain 


S(8x9 + x) = S(x), C(8x9 +x) = C(x), 


which prove the periodicity of S(x) and C(x) and show that their pe- 
riod is 8x9. It is even possible to determine xp. This determination is 
left as a problem, but the result is 


1//2 dt 
xo = —— 
I V/1—t2 


A numerical calculation will show that this is 7/4. 


14.6.1 Frobenius Method of Undetermined Coefficients 


A proper treatment of SOLDEs requires the medium of complex analysis 
and will be undertaken in the next chapter. At this point, however, we are 
seeking a formal infinite series solution to the SOLDE 


y" + p(x)y’ + q(x)y =0, 


where p(x) and g(x) are real and analytic. This means that p(x) and q(x) 
can be represented by convergent power series in some interval (a, b). [The 
interesting case where p(x) and q(x) may have singularities will be treated 
in the context of complex solutions. ] 


The general procedure is to write the expansions® 


[o.@) CO [o.@) 
p= > laex*,  g@)=>obex®, y= Docex* = 14.27) 
k=0 k=0 k=0 


for the coefficient functions p and q and the solution y, substitute them in 
the SOLDE, and equate the coefficient of each power of x to zero. For this 
purpose, we need expansions for derivatives of y: 


[o,e) [o,e) 
y=) okeux®! = 1 + Dees", 
k=1 k=0 


[o@) lo) 
y" = Ok + Dkeepix®! = K+ 2DK+ Vexzort. 
k=1 k=0 


®Here we are expanding about the origin. If such an expansion is impossible or inconve- 
nient, one can expand about another point, say x9. One would then replace all powers of 
x in all expressions below with powers of x — x9. These expansions assume that p, q, 
and y have no singularity at x = 0. In general, this assumption is not valid, and a different 
approach, in which the whole series is multiplied by a (not necessarily positive integer) 
power of x, ought to be taken. Details are provided in Chap. 15. 
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Thus 


Co Cw 
POY = DTD mx (k + Deep’ = Dk + Dance. 


k=0 m=0 k,m 


Let k +m =n and sum over n. Then the other sum, say m, cannot exceed n. 
Thus, 


[e.e) 


n 
p(x)y’ = > Yo(n —m-+1)amen—m+41x". 


n=0m=0 


Similarly, g(x)y = or. 7) Pm Cn—mx". Substituting these sums and 
the series for y” in the SOLDE, we obtain 


(oe) n 
> (n+ 1)(14+2)en42+ om [(n —m+ l)amcen—m4i+ btu | |x —0. 
n=0 m=0 


For this to be true for all x, the coefficient of each power of x must vanish: 


n 

(n+ 1)(n+2)en42 = — a [(n —m-+ 1)amen—m+1 +DbinCn—m| for n => 0, 
m=0 

or 


n-1 


n(nt leng1 = —- So [tn —™M)dmCn—m + DbinCn—m-1] forn>1. (14.28) 


m=0 


If we know co and c, (for instance from boundary conditions), we can 
uniquely determine c, for n > 2 from Eq. (14.28). This, in turn, gives a 
unique power-series expansion for y, and we have the following theorem. 


Theorem 14.6.2 (Existence) For any SOLDE of the form y" + p(x)y’ + 
q(x)y = 0 with analytic coefficient functions given by the first two equations 
of (14.27), there exists a unique power series, given by the third equation of 
(14.27) that formally satisfies the SOLDE for each choice of cg and c,. 


This theorem merely states the existence of a formal power series and 
says nothing about its convergence. The following example will demonstrate 
that convergence is not necessarily guaranteed. 


Example 14.6.3 The formal power-series solution for x7 y’— y-+x =O can 
be obtained by letting y = >> 9 cnx”. Then y’ = Soro y(n + Lcn4ix", and 
substitution in the DE gives 


CO 


xe Vegan 2 Sen +x=0 


n=0 


or 


foe) Co 
x A Depa 09 ee = > oe +x=0. 
n=0 n=2 


14.6 Power-Series Solutions of SOLDEs 


We see that co = 0, cy = 1, and (n 4+ 1)cen41 = Cn42 for n > 0. Thus, we 
have the recursion relation nc, = Cy+1 for n > 1 whose unique solution is 
Cn = (n — 1)!, which generates the following solution for the DE: 


yarta* + QDs + Geto 4+ G@— Dix ss, 
This series is not convergent for any nonzero x. 


The SOLDE solved in the preceding example is not normal. However, 
for normal SOLDEs, the power series of y in Eq. (14.27) converges to an 
analytic function, as the following theorem shows (for a proof, see [Birk 78, 
p. 95)): 


Theorem 14.6.4 For any choice of cg and cy, the radius of convergence of 
any power series solution y = )-r-o cxx* for the normal HSOLDE 


y" + p@x)y’+q)y =0 


whose coefficients satisfy the recursion relation of (14.28) is at least as large 
as the smaller of the two radii of convergence of the two series for p(x) and 
q(x). 


In particular, if p(x) and g(x) are analytic in an interval around x = 0, 
then the solution of the normal HSOLDE is also analytic in a neighborhood 
of x = 0. 


Example 14.6.5 As an application of Theorem 14.6.2, let us consider the 
Legendre equation in its normal form 


2x 
1— x2 


Xr 
1—x? 


” / 


yr 


y=0. 


For |x| < 1 both p and q are analytic, and 


p(x) = —2x (7) = Vea", 


m=0 m=0 
lee) lo) 
q(x) = ey = > 1, 
m=0 m=0 


Thus, the coefficients of Eq. (14.27) are 


0 if m is even, dX if mis even, 
an = and by = 
—2 if misodd QO ifm is odd. 


We want to substitute for a, and by, in Eq. (14.28) to find cy+1. It is 
convenient to consider two cases: when n is odd and when n is even. For 
n= 2r + 1, Eq. (14.28)—after some algebra—yields 


: 
(Qr + Ir + 2erw42 = D> (4r — 4m = A)err—m)- (14.29) 


m=0 
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With r > r + 1, this becomes 


(2r + 3)(2r + 4)car44 
r+l 
_ (4r +4 — 4m — A)car+i—m) 
m=0 


r+l 
= (4r +4 —A)caepiy + D0 Ar +4 = 4m = d)err-41-m) 


m=1 


: 
= (4r +4 —A)cor42 + D> (4r — 4m = d)c2r—m) 


m=0 
= (4r +4 —A)cor42 + (27 + I(r + 2)c2+2 
= [-2+ (2r +3)(2r + 2)]er+2, 


where in going from the second equality to the third we changed the dummy 
index, and in going from the third equality to the fourth we used Eq. (14.29). 
Now we let 2r + 2 =k to obtain (k + 1)(K + 2)cg42 = [k(K + 1) — A] cg, or 


k(k+1)-2 


——_—_——_c;, forevenk. 
(K+ DK+2)" 


Ck+2 = 
It is not difficult to show that starting with n = 27, the case of even n, we 
obtain this same equation for odd k. Thus, we can write 


n(n+1)—-A 


G+DG4+D” (14.30) 


Cnt2 = 

For arbitrary co and c;, we obtain two independent solutions, one of 
which has only even powers of x and the other only odd powers. The gen- 
eralized ratio test (see [Hass 08, Chap. 5]) shows that the series is divergent 


for x = +1 unless A =/(/ + 1) for some positive integer /. In that case the 
infinite series becomes a polynomial, the Legendre polynomial encountered 
in Chap. 8. 


Equation (14.30) could have been obtained by substituting Eq. (14.27) 
directly into the Legendre equation. The roundabout way to (14.30) taken 
here shows the generality of Eq. (14.28). With specific differential equations 
it is generally better to substitute (14.27) directly. 


quantum harmonic Example 14.6.6 We studied Hermite polynomials in Chap. 8 in the context 
oscillator: power series of classical orthogonal polynomials. Let us see how they arise in physics. 
method The one-dimensional time-independent Schroedinger equation for a par- 
ticle of mass m in a potential V(x) is 


h2 aw 


= he Va)yw= Ey, 


where E is the total energy of the particle. 


14.6 Power-Series Solutions of SOLDEs 


For a harmonic oscillator, V(x) = 5 kx? = smo x? and 
mo 5) 2m _ 
Vv Re w+ Fev =0 


Substituting w(x) = H(x) exp(—max? /2h) and then making the change of 
variables x = (1/./ma/h)y yields 


H’ 4) ! _ 7 2E 
—2yH'+AH=0 whererA= ar 1. (14.31) 
@) 


This is the Hermite differential equation in normal form. We assume the 
expansion H(y) = )->°.4cny" which yields 


Cc CO 
H'(y) =o neny” |= ont Densiy”, 
n=1 n=0 
(oe) CO 
H"(y) = ont Densiy" | =o + Dt 2eng2y". 
n=1 n=0 


Substituting in Eq. (14.31) gives 


[oe] [oe] 
Y [+ Dnt 2eny2 + Aen]y" —2 (nt Denpiy"*! =0, 
n=0 n=0 
or 
[o.@) 
2c. + Aco + YS [a + 2)(n + 3)en43 + ACnt1 — 20 + Dengily"*! = 0. 
n=0 


Setting the coefficients of powers of y equal to zero, we obtain 


oA 
= 700 
2 1l-A 
ices aay forn>0, 
(n+ 2)(n + 3) 


or, replacing n with n — 1, 


2n—k 

————_—c¢,, n=l. (14.32) 
(n+ I(n+2) 
The ratio test yields easily that the series is convergent for all values of y. 

Thus, the infinite series whose coefficients obey the recursive rela- 
tion in Eq. (14.32) converges for all y. However, if we demand that 
limy— oo W(x) = 0, which is necessary on physical grounds, the series must 
be truncated. This happens only if A = 2/ for some integer / (see Prob- 
lem 14.22 and [Hass 08, Chap. 13]), and in that case we obtain a polynomial, 
the Hermite polynomial of order /. A consequence of such a truncation is 
the quantization of harmonic oscillator energy: 


Cn42>= 


444 


creation and annihilation 
operators 


14 Second-Order Linear Differential Equations 


Two solutions are generated from Eq. (14.32), one including only even 
powers and the other only odd powers. These are clearly linearly indepen- 
dent. Thus, knowledge of cg and c; determines the general solution of the 
HSOLDE of (14.31). 


14.6.2 Quantum Harmonic Oscillator 


The preceding two examples show how certain special functions used in 
mathematical physics are obtained in an analytic way, by solving a differ- 
ential equation. We saw in Chap. 13 how to obtain spherical harmonics and 
Legendre polynomials by algebraic methods. It is instructive to solve the 
harmonic oscillator problem using algebraic methods. 

The Hamiltonian of a one-dimensional harmonic oscillator is 


2 
1 
H = e. + 5mor x’, 
where p = —ifid/dx is the momentum operator. Let us find the eigenvectors 


and eigenvalues of H. 
We define the operators 


a= ae P and al= a P 
 Y 2h V2mho 2h Iho 


Using the commutation relation [x, p] =ih1, we can show that 


‘ é 1 
[aa] =1 and H=hwa‘a+ sho. (14.33) 
Furthermore, one can readily show that 
[H, a] = —hwa, [H, a’ ] = hwa’. (14.34) 


Let |Wz) be the eigenvector corresponding to the eigenvalue FE: H|yz) = 
E|wWe), and note that Eq. (14.34) gives 


Hale) = (aH — hwa)|We) = (E — ho)alye) 
and 


Ha‘ |W) = (E+ ho)a'|We). 


Thus, alyz) is an eigenvector of H, with eigenvalue E — hw, and a" |Wp) is 
an eigenvector with eigenvalue E + fiw. That is why a‘ and a are called the 
raising and lowering (or creation and annihilation) operators, respectively. 
We can write 


ale) =cEe|WE-ho)- 


By applying a repeatedly, we obtain states of lower and lower energies. 
But there is a limit to this because H is a positive operator: It cannot have a 
negative eigenvalue. Thus, there must exist a ground state, | yo), such that 
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a|vo) = 0. The energy of this ground state (or the eigenvalue corresponding 
to |Wo)) can be obtained:’ 


1 1 
H| vo) = (noa'a + sh) Ivo = sholvo). 


Repeated application of the raising operator yields both higher-level states 
and eigenvalues. We thus define |y,,) by 


(a’)"|Wo) =cnl¥n), (14.35) 


where c, is a normalizing constant. The energy of |y,) is n units higher 


than the ground state’s, or 
1 
En ={[n+ 2 ho, 


which is what we obtained in the preceding example. 
To find c,, we demand orthonormality for the |y,). Taking the inner 
product of (14.35) with itself, we can show (see Problem 14.23) that 


2 2 2 2 
len|~ =n \Cn-1| => |en|o =an!\col’, 


which for |co| = 1 and real cy, yields cy = Vn! It follows, then, that 


1 
= tyr 
IWn) = Tae) |Wo). (14.36) 
In terms of functions and derivative operators, a| yo) = 0 gives quantum harmonic 


oscillator: connection 
( jmo rn | h ad )voos) =0 between algebraic and 
2h 2mw dx analytic methods 


with the solution wo(x) =c exp(—mwx? /2h). Normalizing wo(x) gives 


5 Presa af hin 1/2 
1 = (Wolo) =c / exp(- Jar=c (=) 
es A mw 


1/4 
voix) = (M2) emo? /2h), 
ha 


We can now write Eq. (14.36) in terms of differential operators: 


Te ee = " |x Vf ce  jomeox?/(2K)_ 
Vn! \ hr 2h 2mo dx 


Defining a new variable y = ./mw/hx transforms this equation into 


vu M@ ie 1 d i 3 y?/2 
= e - . 
" hr V2"n! : dy 


7From here on, the unit operator 1 will not be shown explicitly. 


Thus, 
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From this, the relation between Hermite polynomials, and the solutions of 
the one-dimensional harmonic oscillator as given in the previous example, 
we can obtain a general formula for A, (x). In particular, if we note that (see 
Problem 14.23) 


eY 2(y— 4) -y2__ py 4» 
dy dy 


and, in general, 


ay d" 
ey [2 p= gre (—1)"e” cay ae 
dy dy” 


we recover the generalized Rodriguez formula of Chap. 8 for Hermite poly- 
nomials. 


14.7. SOLDEs with Constant Coefficients 


The solution to a SOLDE with constant coefficients can always be found in 
closed form. In fact, we can treat an mth-order linear differential equation 
(NOLDE) with constant coefficients with no extra effort. This section out- 
lines the procedure for solving such an equation. For details, the reader is re- 
ferred to any elementary book on differential equations (see also [Hass 08]). 
The most general nth-order linear differential equation (NOLDE) with con- 
stant coefficients can be written as 


Liy])=y™ +an_1y"—) +--+. tary’ +aoy =r(x). (14.37) 


The corresponding homogeneous NOLDE (HNOLDE) has r(x) = 0. The 
solution to the HNOLDE 


L[y] =y™ +an_py"-) +--- + ary’ +agy =0 (14.38) 


can be found by making the exponential substitution y = e**, which results 
in the equation 


L[e**] = (a” ee mee ee ee ag)e** =0. 
This equation will hold only if A is a root of the characteristic polynomial 
P(A) =A" + aya"! +--+ +aja+a0, 


which, by the fundamental theorem of algebra (Theorem 10.5.6), can be 
written as 


pA) =(A— AEA — Ag). — Am), (14.39) 


where A; is the distinct (complex) root of p(A) with multiplicity k;. 


14.7. SOLDEs with Constant Coefficients 


Theorem 14.7.1 Let {1 ;}""_, be the distinct roots of the characteristic poly- 
nomial of the real HNOLDE of Eq. (14.38) with multiplicities {kj} Then 
the functions 


(pee yea = {e*i*, ROP 0: ee ie ee 


are a basis of solutions of Eq. (14.38). 


When A; is complex, one can write its corresponding solution in terms of 
trigonometric functions. 


Example 14.7.2. An equation that is used in both mechanics and circuit 
theory is 
d*y | dy 


Far le a fora,b>0. (14.40) 


Its characteristic polynomial is p(A) = 7 + ad + b, which has the roots 


1 1 
M= ae +vVa?—4b) and dA2= 5 — va? —4b). 
We can distinguish three different possible motions depending on the 
relative sizes of a and b. 


(a) a*>4b (overdamped): Here we have two distinct simple roots. The 
multiplicities are both one (k; = kz = 1); therefore, the power of x for 
both solutions is zero (r} = r2 = 0). Let y = 1 /a2 — 4b. Then the 
most general solution is 


y(t) = err (cie”! + coe ¥*), 


Since a > 2y, this solution starts at y = cy + cz at t = 0 and continu- 
ously decreases; so, as tf > 00, y(t) > 0. 

(b) a*=4b (critically damped): In this case we have one multiple root of 
order 2 (ki = 2); therefore, the power of x can be zero or | (r; = 0, 1). 
Thus, the general solution is 


—at/2 =at/2 


y(t) = cite + coe 


This solution starts at y(0) = co at t = 0, reaches a maximum (or min- 
imum) at t = 2/a — co/c;, and subsequently decays (grows) exponen- 
tially to zero. 

(c) a* <4b (underdamped): Once more, we have two distinct sim- 
ple roots. The multiplicities are both one (kj = kz = 1); therefore, 
the power of x for both solutions is zero (7) = r2 = 0). Let w = 
5/4b —a?. Then A; = —a/2 + iw and Az = Mi. The roots are com- 
plex, and the most general solution is thus of the form 


—at [2 


y(t) =e“! (cy cos wt + cz sinwt) = Ae cos(wt + a). 


The solution is a harmonic variation with a decaying amplitude 
Aexp(—at/2). Note that if a = 0, the amplitude does not decay. That 
is why a is called the damping factor (or the damping constant). 
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These equations describe either a mechanical system oscillating (with 
no external force) in a viscous (dissipative) fluid, or an electrical circuit 
consisting of a resistance R, an inductance L, and a capacitance C. For 
RLC circuits, a= R/L and b = 1/(LC). Thus, the damping factor depends 
on the relative magnitudes of R and L. On the other hand, the frequency 


— |p ay _ 1 R2 
— 2) “VLC 412 
depends on all three elements. In particular, for R > 2./L/C the circuit does 
not oscillate. 


A physical system whose behavior in the absence of a driving force is 
described by a HNOLDE will obey an inhomogeneous NOLDE in the pres- 
ence of the driving force. This driving force is simply the inhomogeneous 
term of the NOLDE. The best way to solve such an inhomogeneous NOLDE 
in its most general form is by using Fourier transforms and Green’s func- 
tions, as we will do in Chap. 20. For the particular, but important, case in 
which the inhomogeneous term is a product of polynomials and exponen- 
tials, the solution can be found in closed form. 


Theorem 14.7.3 The INOLDE L[y] = e** S(x), where S(x) is a polyno- 
mial, has the particular solution e** q(x), where q(x) is also a polynomial. 
The degree of q(x) equals that of S(x) unless . = d;, a root of the char- 
acteristic polynomial of L, in which case the degree of q(x) exceeds that of 
S(x) by kj, the multiplicity of d;. 


Once we know the form of the particular solution of the NOLDE, we can 
find the coefficients in the polynomial of the solution by substituting in the 
NOLDE and matching the powers on both sides. 


Example 14.7.4 Let us find the most general solutions for the following 
two differential equations subject to the boundary conditions y(0) = 0 and 
y'(0) =1. 


(a) The first DE we want to consider is 
y +y=xe*. (14.41) 


The characteristic polynomial is A* + 1, whose roots are A; =i and 
42 = —i. Thus, a basis of solutions is {cos.x, sinx}. To find the par- 
ticular solution we note that A (the coefficient of x in the exponential 
part of the inhomogeneous term) is 1, which is neither of the roots 
A, and A2. Thus, the particular solution is of the form g(x)e*, where 
q(x) = Ax + B is of degree | [same degree as that of S(x) = x]. We 
now substitute y = (Ax + B)e* in Eq. (14.41) to obtain the relation 


2Axe* + (2A + 2B)e* =xe*. 


14.7. SOLDEs with Constant Coefficients 


(b) 


Matching the coefficients, we have 


1 
2A=1 and 2A+2B=0 > me amid 


Thus, the most general solution is 
1 ¥ 
y =c,cosx +c2sinx + 5 — le’. 


Imposing the given boundary conditions yields 0 = y(0) = c; — 5 and 
1 = y’(0) =cp. Thus, 


+ si ja Ie* 
= = COS sin = = 
y 5) Xx Xx 5 Xx e 


is the unique solution. 
The next DE we want to consider is 


y Spas". (14.42) 


Here p(A) = ee 1, and the roots are A; = 1 and Az = —1. A basis 
of solutions is {e*, e~*}. To find a particular solution, we note that 
S(x) =x and A= 1 =A,. Theorem 14.7.3 then implies that g(x) must 
be of degree 2, because A; is a simple root, i.e., k} = 1. We therefore 
try 


q(x) =Ax*+Bxt+C = y=(Ax°+Bx4+C)e*. 


Taking the derivatives and substituting in Eq. (14.42) yields two equa- 
tions, 


4A=1 and A+B=0, 


whose solution is A = —B = i Note that C is not determined, be- 
cause Ce* is a solution of the homogeneous DE corresponding to 
Eq. (14.42), so when L is applied to y, it eliminates the term Ce’. 
Another way of looking at the situation is to note that the most general 
solution to (14.42) is of the form 


1 1 
y=cjye+oe*4+ (4° - a + che. 
The term Ce* could be absorbed in cje*. We therefore set C = 0, 
apply the boundary conditions, and find the unique solution 


=, 12 es 
y= z sinh + 7 (x x)e : 
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14.8 The WKB Method 


In this section, we treat the somewhat specialized method of obtaining an 
approximate solution to a particular type of second-order DE arising from 
the Schrédinger equation in one dimension. The method’s name comes from 
Wentzel, Kramers, and Brillouin, who invented it and applied it for the first 
time. 

Suppose we are interested in finding approximate solutions of the DE 


d’y 

—> +4(x)y =0 (14.43) 

dx 
in which q varies “slowly” with respect to x in the sense discussed below. If 
q Varies infinitely slowly, i.e., if it is a constant, the solution to Eq. (14.43) is 
simply an imaginary exponential (or trigonometric). So, let us define ¢(x) 


by y =e!) and rewrite the DE as 
(¢') +i" —q =0. (14.44) 


Assuming that @” is small (compared to q), so that y does not oscillate too 
rapidly, we can find an approximate solution to the DE: 


@=t/g => ona | Ja@ax. (14.45) 


The condition of validity of our assumption is obtained by differentiating 
(14.45): 


|p" | ~ <Iql. 


1 | q' 
2|/4 
It follows from Eq. (14.45) and the definition of @ that 1/,/q is approx- 
imately 1/(27r) times one “wavelength” of the solution y. Therefore, the 
approximation is valid if the change in q in one wavelength is small com- 
pared to |q|. 

The approximation can be improved by inserting the derivative of (14.45) 
in the DE and solving for a new ¢: 


(o"xqtt = ox (« ;+) | 


or 


49+ 77 => woes f Vgdx+ Sing. 


The two choices give rise to two different solutions, a linear combination of 
which gives the most general solution. Thus, 


yr Fas 1! | vaas| +erexn|-i f vaax]], (14.46) 


14.8 The WKB Method 


Equation (14.46) gives an approximate solution to (14.43) in any region 
in which the condition of validity holds. The method fails if g changes too 
rapidly or if it is zero at some point of the region. The latter is a serious diffi- 
culty, since we often wish to join a solution in a region in which g(x) > 0 to 
one in a region in which g(x) < 0. There is a general procedure for deriving 
the so-called connection formulas relating the constants c; and c2 of the two 
solutions on either side of the point where g(x) = 0. We shall not go into the 
details of such a derivation, as it is not particularly illuminating.® We simply 
quote a particular result that is useful in applications. 

Suppose that gq passes through zero at xo, is positive to the right of xo, 
and satisfies the condition of validity in regions both to the right and to 
the left of x9. Furthermore, assume that the solution of the DE decreases 
exponentially to the left of x9. Under such conditions, the solution to the 
left will be of the form 


es exo| is J—q(x) bas (14.47) 


while to the right, we have 


1 ~ cA 
ae Vq(x) dx — | (14.48) 


A similar procedure gives connection formulas for the case where q is pos- 
itive on the left and negative on the right of xo. 


Example 14.8.1 Consider the Schrodinger equation in one dimension 


7 t BlE-V@)]¥ =0 


where V (x) is a potential well meeting the horizontal line of constant E at 
x =a and x =b, so that 


>0 ifa<x <b, 


<0 ifx<aorx>b. 


qa)= aE - Vix)] | 


The solution that is bounded to the left of a must be exponentially decay- 
ing. Therefore, in the interval (a, b) the approximate solution, as given by 
Eq. (14.48), is 


W(x) & (E— aon [VF —|[E- V(x) Jax-3), 


where A is some arbitrary constant. The solution that is bounded to the right 
of b must also be exponentially decaying. Hence, the solution fora <x <b 
is 


Wa) (E- aa [3 — [E — V(x) Jax—4). 


8The interested reader is referred to the book by Mathews and Walker, pp. 27-37. 
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Since these two expressions give the same function in the same region, they 
must be equal. Thus, A = B, and, more importantly, 


* 12m sa 
cos( [ arlE _ V(x)] dx — *) 
> 12m a 
=cos( f File — V(x)| dx — 7). 


[ ele Voolar = (n+p )an 


This is essentially the Bohr-Sommerfeld quantization condition of pre-1925 
quantum mechanics. 


or 


14.8.1 Classical Limit of the Schrédinger Equation 


As long as we are approximating solutions of second-order DEs that arise 
naturally from the Schrddinger equation, it is instructive to look at another 
approximation to the Schrédinger equation, its classical limit in which the 
Planck constant goes to zero. 

The idea is to note that since y(r, t) is a complex function, one can write 
it as 


wr, t) = Ar, nero] £S¢r | (14.49) 


where A(r,t) and S(r, t) are real-valued functions. Substituting (14.49) in 
the Schrédinger equation and separating the real and the imaginary parts 
yields 


G0 g YOO NO h2 VA 
ot 2m 2m A (14.50) 


JA A_» 
m—+VS-VA+ —V°S=0. 
ot 2 


These two equations are completely equivalent to the Schrddinger equa- 
tion. The second equation has a direct physical interpretation. Define 


2 2 2 y 
pr, t)=A(r,t)=|wir.f)|° and Jir,t) = A*(r,1) = = PY, 
“SH 
(14.51) 
multiply the second equation in (14.50) by 2A/m, and note that it then can 
be written as 


—+V-J=0, (14.52) 


which is the continuity equation for probability. The fact that J is indeed the 
probability current density is left for Problem 14.32. 


14.9 Problems 


The first equation of (14.50) gives an interesting result when h > 0 be- 
cause in this limit, the RHS of the equation will be zero, and we get 


os 1 4 
—+- V=0. 
ot 7 ad + 


Taking the gradient of this equation, we obtain 


0 
(3 +¥.V)mv-+VV=0, 
which is the equation of motion of a classical fluid with velocity field v = 
VS/m. We thus have the following: 


Proposition 14.8.2 In the classical limit, the solution of the Schrédinger 
equation describes a fluid (statistical mixture) of noninteracting classical 
particles of mass m subject to the potential V (r). The density and the current 
density of this fluid are, respectively, the probability density p = |\y|? and 
the probability current density J of the quantum particle. 


14.9 Problems 


14.1 Let u(x) be a differentiable function satisfying the differential in- 
equality u'(x) < Ku(x) for x € [a,b], where K is a constant. Show that 
u(x) < u(ajeX°—®, Hint: Multiply both sides of the inequality by e~**, 
and show that the result can be written as the derivative of a nonincreasing 
function. Then use the fact that a < x to get the final result. 


14.2 Prove Proposition 14.4.2. 


14.3 Let fi(~) =x and fo(x) = |x| for x € [—1, 1]. Show that these two 
functions are linearly independent in the given interval, and that their Wron- 
skian vanishes. Is this a violation of Theorem 14.4.3? 


14.4 How would you generalize the Wronskian to n functions which have 
derivatives up to nth order? Prove that the Wronskian vanishes if the func- 
tions are linearly dependent. 


14.5 Let f and g be two differentiable functions that are linearly dependent. 
Show that their Wronskian vanishes. 


14.6 Show that if (f1, f/) and (fo, f;) are linearly dependent at one point, 
then f; and f2 are linearly dependent at all x € [a,b]. Here f| and fo are 
solutions of the DE of (14.12). Hint: Derive the identity 


x2 
Whi, fo; x2) = Wosis feaderp| — f paar}. 


1 
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14.7 Show that the solutions to the SOLDE y” + q(x)y = 0 have a constant 
Wronskian. 


14.8 Find (in terms of an integral) G,,(x), the linearly independent “part- 
ner’ of the Hermite polynomial H,, (x). Specialize this ton = 0, 1. Is it pos- 
sible to find Go(x) and G; (x) in terms of elementary functions? 


14.9 Let fi, fo, and f3 be any three solutions of y” + py’ + gy =0. Show 
that the (generalized 3 x 3) Wronskian of these solutions is zero. Thus, any 
three solutions of the HSOLDE are linearly dependent. 


14.10 For the HSOLDE y” + py’ + gy =0, show that 
ABR yg SBA 
Wf, fr) W(fi, fr) 


Thus, knowing two solutions of an HSOLDE allows us to reconstruct the 
DE. 


14.11 Let f|, fo, and f be three solutions of the third-order linear dif- 
ferential equation y’” + po(x)y” + pi(x)y’ + po(x)y = 0. Derive a FODE 
satisfied by the (generalized 3 x 3) Wronskian of these solutions. 


14.12 Prove Corollary 14.4.12. Hint: Consider the solution u = 1 of the DE 
u” = 0 and apply Theorem 14.4.10. 


14.13 Show that the adjoint of M given in Eq. (14.20) is the original L. 


14.14 Show that if u(x) and v(x) are solutions of the self-adjoint DE 
(pu’)’ + qu = 0, then Abel’s identity, p(uv’ — vu’) = constant, holds. 


14.15 Reduce each DE to self-adjoint form. 
(a) xy" +xy'+y=0, (b) y"+y'tanx =0. 


14.16 Reduce the self-adjoint DE (py’)’ + gy =0 to u” + S(x)u = 0 by 
an appropriate change of the dependent variable. What is S(x)? Apply this 
reduction to the Legendre DE for P,, (x), and show that 


ltnn+1)—n(nt Ix? 
(1 — x2)? 


S(x)= 


Now use this result to show that every solution of the Legendre equation has 
at least (2n + 1)/z zeros on (—1, +1). 


14.17 Substitute v = y’/y in the homogeneous SOLDE 


y" + p(x)y’ + q(x)y =0 


and: 


14.9 Problems 


(a) Show that it turns into v’ + v* + p(x)v + q(x) =0, which is a first- 
order nonlinear equation called the Riccati equation. Would the same 
substitution work if the DE were inhomogeneous? 

(b) Show that by an appropriate transformation, the Riccati equation can 
be directly cast in the form u’ + u? + S(x) =0. 


14.18 For the function S(x) defined in Example 14.6.1, let S~!(x) be the 
inverse, i.e., S~!(S(x)) =x. Show that 


dy oe 
4 [sw]= 


and given that § —1(0) = 0, conclude that 


_1 * dt 
S'(x) = ; 
0 V1—?? 
14.19 Define sinhx and cosh. as the solutions of y” = y satisfying the 
boundary conditions y(0) = 0, y’(0) = 1 and y(0) = 1, y’(0) = 0, respec- 
tively. Using Example 14.6.1 as a guide, show that 
(a) cosh? x — sinh? x = 1, 
(b) cosh(—x) =coshx, 
(c) sinh(—x) = —sinhx. 


(d) sinh(a+ x) =sinhacoshx + coshasinhx. 


14.20 For Example 14.6.5, derive 


(a) Equation (14.29), and 

(b) Equation (14.30) by direct substitution. 

(c) Let A=/(@+ 1) and calculate the Legendre polynomials P;(x) for 
1=0, 1, 2,3, subject to the condition P;(1) = 1. 


14.21 Use Eq. (14.32) of Example 14.6.6 to generate the first three Hermite 
polynomials. Use the normalization 


[ [Hn(x) Pe dx = J 2"n! 


to determine the arbitrary constant. 


14.22 The function defined by 


[ee 


f() = a Cnx", where Cn42 = 
n=0 


2n— xX 
ee ONG 
(n+ 1)(n+2) 


can be written as f(x) = cog(x) + cyh(x), where g is even and A is odd 
2 

in x. Show that f(x) goes to infinity at least as fast as e* does, i.e., 

limy-+o0 f (x)ew™ # 0. Hint: Consider g(x) and h(x) separately and show 
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Airy’s DE 


that 


= 4n—x 
we n _ 
g(x) => a i where bn+1 = Qn+DOnt2-" 
n= 
Then concentrate on the ratio g(x)/ e*, where g and e* are approximated 
by polynomials of very high degrees. Take the limit of this ratio as x — oo, 
and use recursion relations for g and e* . The odd case follows similarly. 


14.23 Refer to Sect. 14.6.2 for this problem. 


(a) Derive the commutation relation [a, a‘] = 1. 

(b) Show that the Hamiltonian can be written as given in Eq. (14.33). 

(c) Derive the commutation relation [a, (a‘)”] =n(a‘)""!. 

(d) Take the inner product of Eq. (14.35) with itself and use (c) to show 
that |c,|? = n|cp—1|*. From this, conclude that |c, |? 

(e) For any function f(y), show that 


d 2 2df 
—~ = Ver"? ¢) = py [2 fe 
(» ag ae dy” 


=n! col’. 


Apply (y — d/dy) repeatedly to both sides of the above equation to 
obtain 


n 
dy dy" 
(f) Choose an appropriate f(y) in part (e) and show that 


2 a\ 2 2d", _2 
e P(y- 4) a PS ee ale 


14.24 Solve Airy’s DE, y” + xy =0, by the power-series method. Show 
that the radius of convergence for both independent solutions is infinite. 
Use the comparison theorem to show that for x > 0 these solutions have 
infinitely many zeros, but for x < 0 they can have at most one zero. 


14.25 Show that the functions x”e**, where r =0,1,2,...,k, are linearly 
independent. Hint: Apply appropriate powers of D — A to a linear combina- 
tion of x”e** for all possible r’s. 


14.26 Find a basis of real solutions for each DE. 


(a) y’+5y’+6=0, (b) y+ 6y" + 12y’ + 8y =0, 


d+y d*y 
= d) —{~=-y. 
(c) ae Jy; ( ) dx4 y 
14.27 Solve the following initial value problems. 
dty = ’ m My 
(a) 77>), yO)=yO=y (O)=0, y (0) =1, 


dx 


14.9 Problems 


d*y d*y ” m ’ 

(b) ant + in 0, yO)=y 0)=y" (0) =0, y (0) =1, 
d*y ’ ” m 

(c) ae 0, y0)=yO)=y (0) =0, y (0) =2. 


14.28 Solve y” — 2y' + y = xe* subject to the initial conditions y(0) = 0, 
y'(0) =1. 


14.29 Find the general solution of each equation, 


(a) ye, (b) y" — 4y! + 4y = x?, 
(c) y”+y=sinx sin 2x, (d) y"-y=(1tey’, 
(ce) y”—y=e* sin2x, (f) yO —yM = x2, 


(g) y—4y'+4=e%+xe%,  (h) yy ty=e™. 
14.30 Consider the Euler equation, 
xy 4 ay yx” yO) +... + ayxy’ + apy =r(x). 


Substitute x = e’ and show that such a substitution reduces this to a DE with 
constant coefficients. In particular, solve xy" — 4xy’ + 6y =x. 


14.31 Show that 
(a) _ the substitution (14.49) reduces the Schrédinger equation to (14.50), 
and 
(b) derive the continuity equation for probability from the second equa- 
tion of (14.50). 
14.32 Show that the usual definition of probability current density, 
a A 
J =Re| y*— Vy |, 
im 


reduces to that in Eq. (14.51) if we use (14.49). 
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We have familiarized ourselves with some useful techniques for finding so- 
lutions to differential equations. One powerful method that leads to formal 
solutions is power series. We also stated Theorem 14.6.4 which guarantees 
the convergence of the solution of the power series within a circle whose 
size is at least as large as the smallest of the circles of convergence of the 
coefficient functions. Thus, the convergence of the solution is related to the 
convergence of the coefficient functions. What about the nature of the con- 
vergence, or the analyticity of the solution? Is it related to the analyticity of 
the coefficient functions? If so, how? Are the singular points of the coeffi- 
cients also singular points of the solution? Is the nature of the singularities 
the same? This chapter answers some of these questions. 

Analyticity is best handled in the complex plane. An important reason 
for this is the property of analytic continuation discussed in Chap. 12. The 
differential equation du/dx =u has a solution u = —1/x for all x except 
x = 0. Thus, we have to “puncture” the real line by removing x = 0 from it. 
Then we have two solutions, because the domain of definition of u = —1/x 
is not connected on the real line (technically, the definition of a function 
includes its domain as well as the rule for going from the domain to the 
range). In addition, if we confine ourselves to the real line, there is no way 
that we can connect the x > 0 region to the x < 0 region. However, in the 
complex plane the same equation, dw/dz = w”, has the complex solution 
w = —1/z, which is analytic everywhere except at z = 0. Puncturing the 
complex plane does not destroy the connectivity of the region of definition 
of w. Thus, the solution in the x > 0 region can be analytically continued to 
the solution in the x < 0 region by going around the origin. 

The aim of this chapter is to investigate the analytic properties of the 
solutions of some well known SOLDEs in mathematical physics. We begin 
with a result from differential equation theory (for a proof, see [Birk 78, 
p. 223]). 


Proposition 15.0.1 (Continuation principle) The function obtained by an- 
alytic continuation of any solution of an analytic differential equation along 
any path in the complex plane is a solution of the analytic continuation of 
the differential equation along the same path. 
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An analytic differential equation is one with analytic coefficient func- 
tions. This proposition makes it possible to find a solution in one region of 
the complex plane and then continue it analytically. The following example 
shows how the singularities of the coefficient functions affect the behavior 
of the solution. 


Example 15.0.2 Let us consider the FODE w’ — (y/z)w =0 for y ER. 
The coefficient function p(z) = —y/z has a simple pole at z = 0. The solu- 
tion to the FODE is easily found to be w = z”. Thus, depending on whether 
y is a nonnegative integer, a negative integer —m, or a noninteger, the so- 
lution has a regular point, a pole of order m, or a branch point at z = 0, 
respectively. 


This example shows that the singularities of the solution depend on the 
parameters of the differential equation. 


15.1 Analytic Properties of Complex DEs 


To prepare for discussing the analytic properties of the solutions of SOL- 
DEs, let us consider some general properties of differential equations from 
a complex analytical point of view. 


15.1.1 Complex FOLDEs 


In the homogeneous FOLDE 


dw 


a + p(z)w =0, (15.1) 


p(z) is assumed to have only isolated singular points. It follows that p(z) 
can be expanded about a point zgy—which may be a singularity of p(z)—as 
a Laurent series in some annular region r; < |z — Zo| <r: 


[e,e) 


P= D> an(z—z0)" where ry <|z—z0l <2. 


n=—OoO 


The solution to Eq. (15.1), as given in Theorem 14.2.1 with g = 0, is 


w(z) = exo| - i p(2) az| 


cove] af = Yeas fe z0)"dz 


Seo n=0 


~ = [oc — a" 
n=2 


15.1 Analytic Properties of Complex DEs 


= Ce%9| -a-atn 20) > = (z—zo)"*! 


[o,e) 
a—n-1 = 
+)> ——@-a) ‘| 


n=1 


We can write this solution as 
w(z) = C(z — zo)" g(z), (15.2) 


where a = —a_, and g(z) is an analytic single-valued function in the annu- 
lar region r; < |z — zo| < r2 because g(z) is the exponential of an analytic 
function. 

For the special case in which p has a simple pole, i.e., when a_, = 0 
for all n > 2, the second sum in the exponent will be absent, and g will be 
analytic even at zo. In fact, g(zo) = 1, and choosing C = 1, we can write 


w(z) = e-a9"] 1+ Yrate—a0 (15.3) 


k=1 


Depending on the nature of the singularity of p(z) at zo, the solutions 
given by Eq. (15.2) have different classifications. For instance, if p(z) has a 
removable singularity (if a_, = 0 Wn > 1), the solution is Cg(z), which is 
analytic. In this case, we say that the FOLDE [Eq. (15.1)] has a remov- 
able singularity at zo. If p(z) has a simple pole at zo (if a_1 #0 and 
a_n = 0 Vn > 2), then in general, the solution has a branch point at zo. 
In this case we say that the FOLDE has a regular singular point. Finally, 
if p(z) has a pole of order m > 1, then the solution will have an essential 
singularity (see Problem 15.1). In this case the FOLDE is said to have an 
irregular singular point. 

To arrive at the solution given by Eq. (15.2), we had to solve the FOLDE. 
Since higher-order differential equations are not as easily solved, it is desir- 
able to obtain such a solution through other considerations. The following 
example sets the stage for this endeavor. 


Example 15.1.1 A FOLDE has a unique solution, to within a multiplica- 
tive constant, given by Theorem 14.2.1. Thus, given a solution w(z), 
any other solution must be of the form Cw(z). Let zg be a singularity 
of p(z), and let z — z = re’. Start at a point z and circle zg so that 
6 — 6+ 27. Even though p(z) may have a simple pole at zo, the solu- 
tion may have a branch point there. This is clear from the general solu- 
tion, where a may be a noninteger. Thus, w(z) = w(zo + pel@ren)) may 
be different from w(z). To discover this branch point—without solving 
the DE—invoke Proposition 15.0.1 and conclude that w(z) is also a so- 
lution to the FOLDE. Thus, w(z) can be different from w(z) by at most 
a multiplicative constant: w(z) = Cw(z). Define the complex number a 
by C = e*”!*. Then the function g(z) = (z — zo)~“w(z) is single-valued 
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around Zo. In fact, 


g(zo+ pen) = [rei Ot?) w(zo + peer) 

=(g—g9) “e ewig) = (=z) “wie)— et); 

This argument shows that a solution w(z) of the FOLDE of Eq. (15.1) 
can be written as w(z) = (z — zo)” g(z), where g(z) is single-valued. 


15.1.2 The Circuit Matrix 


The method used in Example 15.1.1 can be generalized to obtain a similar 
result for the NOLDE 
d"w d™w 


L[w]= + Pn— 1@) at | 


‘ae aba pi + po(z)w=0 (15.4) 


where all the p;(z) are analytic in ry < |z — zo| <r. 

Let {w; (@)Yiny be a basis of solutions of Eq. (15.4), and let z— zg = re 
Start at z and analytically continue a functions wj;(z) one complete turn 
to 6 + 27. Let w;(z) = Wj (zo + re!®) = wj(zo + reiO+20), Then, by a 
generalization of Proposition 15.0.1, {w; (yr are not only solutions, but 
they are linearly independent (because they are wj’s evaluated at a different 
point). Therefore, they also form a basis of solutions. On the other hand, 
wj;(z) can be expressed as a linear combination of the w (z). Thus, 


id 


n 
(2) = wj(zo tre Or”) = :S ajKW(Z). 
k=1 


The matrix A = (a;,), called the circuit matrix of the NOLDE, is invertible, 
because it transforms one basis into another. Therefore, it has only nonzero 
eigenvalues. We let 4 be one such eigenvalue, and choose the column vector 
C, with entries {c;}/_,, to be the corresponding eigenvector of the transpose 
of A (note that A and A’, have the same set of eigenvalues). At least one such 
eigenvector always exists, because the characteristic polynomial of A’ has 
at least one root. Now we let w(z) = ae cjw;(z). Clearly, this w(z) is a 
solution of (15.4), and 


n 
(2) = w(zo + relO?™) = YS cjwj(zo + rel) 
j=l 


3 Sa ajxwe(z) =) (A‘),;¢;we(Z) =) Ackwe(z) = Aw(2). 
j= k=1 


dk k=1 


If we define a by A = e?7!”, then w(zo + re! O12”) = e77!"w(z). Now 
we write f(z) = (z — zo) *w(z). Following the argument used in Exam- 
ple 15.1.1, we get f (zo + re! @+?")) = f(z); that is, f(z) is single-valued 
around zo. We thus have the following theorem. 


15.2 Complex SOLDEs 


Theorem 15.1.2 Any homogeneous NOLDE with analytic coefficient func- 
tions in r, <|zZ— z0| <1r2 admits a solution of the form 


w(z) = (z — zo)" f(z) 


where f(z) is single-valued around zg inr, < |z — z0| <1r2. 


An isolated singular point zo near which an analytic function w(z) can be 
written as w(z) = (z — zo)* f(z), where f(z) is single-valued and analytic 
in the punctured neighborhood of zo, is called a simple branch point of 
w(z). The arguments leading to Theorem 15.1.2 imply that a solution with 
a simple branch point exists if and only if the vector C whose components 
appear in w(z) is an eigenvector of A’, the transpose of the circuit matrix. 
Thus, there are as many solutions with simple branch points as there are 
linearly independent eigenvectors of A’. 


15.2 Complex SOLDEs 


Let us now consider the SOLDE w” + p(z)w’ + q(z)w = 0. Given two 
linearly independent solutions w1(z) and w2(z), we form the 2 x 2 matrix 
A and try to diagonalize it. There are three possible outcomes: 


1. The matrix A is diagonalizable, and we can find two eigenvectors, F(z) 
and G(z), corresponding, respectively, to two distinct eigenvalues, 1 
and A2. This means that 


F(zo+ perry =A, F(z), and G(zo+ pee) = }2G(z). 
Defining A, = e27!@ and Ap = 27 'F we get 
F(z) =(z—zo)* f(z) and G(z) = (z—z0)¥ g(z), 


as Theorem 15.1.2 suggests. The set { F(z), G(z)} is called a canonical 
basis of the SOLDE. 

2. The matrix A is diagonalizable, and the two eigenvalues are the same. 
In this case both F(z) and G(z) have the same constant a: 


F(z) =(z—2z0)* f(z) and G(z)=(z—zo)*g(z). 


3. We cannot find two eigenvectors. This corresponds to the case where A 
is not diagonalizable. However, we can always find one eigenvector, so 
A has only one eigenvalue, 4. We let w1(z) be the solution of the form 
(z — zo)” f(z), where f(z) is single-valued and A = e?7'. The exis- 
tence of such a solution is guaranteed by Theorem 15.1.2. Let w2(z) be 
any other linearly independent solution (Theorem 14.3.5 ensures the 
existence of such a second solution). Then 


w2(zo tre'@”) = awy(z) + bwr(z), 
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and the circuit matrix will be A= G mee which has eigenvalues A and b. 


Since A is assumed to have only one eigenvalue (otherwise we would 
have the first outcome again), we must have b = i. This reduces A 
toA= ( : ae where a # 0. The condition a ¥ 0 is necessary to distin- 
guish this case from the second outcome. Now we analytically continue 


h(z) = w2(z)/w1(z) one whole turn around Zo, obtaining 


i(O-+2n)) _ w2(zo + rei Ot27)) __ awy(z) +Aw2(z) 


h - = 
keo-tre wi (zo + rei@+27)) dw1(z) 
a wz) a 
et = —+h(z). 
A wiz) aA @) 


It then follows that the function! 


gi(z) =h(@) — — Ing — z0) 


2Q0i 
is single-valued in r} < |z—zo| <2. If we redefine g1(z) and w2(z) as 
(27idA/a)g;(z) and (277i /a)w2(z), respectively, we have the follow- 
ing: 


Theorem 15.2.1 Jf p(z) and q(z) are analytic for r, < |z — zo| < 1r2, then 
the SOLDE w" + p(z)w' + q(z)w =0 admits a basis of solutions {w,, w2} 
in the neighborhood of the singular point zo, where either 


w1(z) = (z — z0)* f (2), wa(z) = (z — zo) g(z) 


or, in exceptional cases (when the circuit matrix is not diagonalizable), 


wi(z)=(z— zo)" f(z), waz) = wi(z)[g1(z) + In(z — z0)]. 


The functions f(z), g(z), and g1(z) are analytic and single-valued in the 
annular region. 


This theorem allows us to factor out the branch point zo from the rest 
of the solutions. However, even though f(z), g(z), and g1(z) are analytic 
in the annular region rj < |z — zo| < 2, they may very well have poles 
of arbitrary orders at zg. Can we also factor out the poles? In general, we 
cannot; however, under special circumstances, described in the following 
definition, we can. 


Definition 15.2.2 A SOLDE w” + p(z)w’ + q(z)w = 0 that is analytic in 
0 < |z—zo| <7 is said to have a regular singular point at zo if p(z) has at 
worst a simple pole and g(z) has at worst a pole of order 2 there. 


‘Recall that In(z — zg) increases by 2zi for each turn around Zo. 
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In a neighborhood of a regular singular point zo, the coefficient functions 
p(z) and qg(z) have the power-series expansions 


[o,2) 
a 
p(z)=——— + 9 age — 20), 
& = £0 k=0 


(a a ea 
(z—z0)? 2-20 = , 


Multiplying both sides of the first equation by z — zo and the second by 
(z — zo)* and introducing P(z) = (z — zo) p(z), O(z) = (z — zo)" (z), we 


obtain 
P()=) ax-1—20)", = O@) => i-2(z - 20). 
k=0 k=0 


It is also convenient to multiply the SOLDE by (z — zo)” and write it as 
(z — zo)’ w" + (< — zo) P(z)w’ + O(z)w =0. (15.5) 


Inspired by the discussion leading to Theorem 15.2.1, we write 


w(z) = (z— 20)” > Cx(z—z0)*,  Co=l, (15.6) 
k=0 


where we have chosen the arbitrary multiplicative constant in such a way 
that Co = 1. Substitute this in Eq. (15.5), and change the dummy variable— 
so that all sums start at O—to obtain 


[e,e) 


Diy @+yy@ty—DCn+ STK + v)an—e—1 + bales] 


n=0 k=0 


x (z—z0)"t” =0, 


which results in the recursion relation 


n 


(n+ vn v= 16, =— [0+ Vane + bne-2]Ce- 15.7) 
k=0 


For n = 0, this leads to what is known as the indicial equation for the 
exponent v: 


TQ) =v0v — 1) +a_jv + b_2 = 0. (15.8)  indicial equation, indicial 
polynomial, 


The roots of this equation are called the characteristic exponents of zo, and os 
characteristic exponents 


I(v) is called its indicial polynomial. In terms of this polynomial, (15.7) 
can be expressed as 


n—-1 


I(n+v)Cy =— S[k +V)n—e—1 + bn—k-2]Cx forn =1,2,.... 
k=0 
(15.9) 
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Equation (15.8) determines what values of v are possible, and Eq. (15.9) 
gives Cy, C2, C3,..., which in turn determine w(z). Special care must be 
taken if the indicial polynomial vanishes at n + v for some positive integer 
n, that is, if m + v, in addition to v, is a root of the indicial polynomial: 
I(n+tv)=0=/(). 

If vy and v2 are characteristic exponents of the indicial equation and 
Re(v,) > Re(v2), then a solution for vy always exists. A solution for v2 
also exists if v} — v2 #n for any (positive) integer n. In particular, if zo is 
an ordinary point [a point at which both p(z) and g(z) are analytic], then 
only one solution is determined by (15.9). (Why?) The foregoing discussion 
is summarized in the following: 


Theorem 15.2.3 If the differential equation w" + p(z)w’ +. q(z)w =0 has 
a regular singular point at z = Zo, then at least one power series of the form 
of (15.6) formally solves the equation. If v, and v2 are the characteristic 
exponents of z0, then there are two linearly independent formal solutions 
unless Vv, — v2 is an integer. 


Example 15.2.4 Let us consider some familiar differential equations. 


(a) The Bessel equation is 


In this case, the origin is a regular singular point, a_; = 1, and b_2 = 
—a*. Thus, the indicial equation is v(v — 1) + v — a> = 0, and its 
solutions are vj = a and v2 = —a. Therefore, there are two linearly 
independent solutions to the Bessel equation unless vy — v2 = 2a is an 
integer, i.e., unless q@ is either an integer or a half-integer. 

(b) For the Coulomb potential f(r) = 6/r, the radial equation (13.14) 
reduces to 


The point z = 0 is a regular singular point at which a_; = 2 and b_2 = 
—a. The indicial polynomial is /(v) = v* + v — a with characteristic 
exponents 


1 1 
aa gvitte and v2 =—~— =~V1+4a. 


There are two independent solutions unless vj — v2 = /1 + 4a is an 
integer. In practice, a = /(/+ 1), where / is some integer; so vy — v2 = 
21 + 1, and only one solution is obtained. 

(c) The hypergeometric differential equation is 


jg Dey ap 


w=0. 
z(1—z) z(1 — z) 


15.2 Complex SOLDEs 


A substantial number of functions in mathematical physics are solu- 
tions of this remarkable equation, with appropriate values for a, B, 
and y. The regular singular points” are z = 0 and z= 1. At z=0, 
a_, = y and b_2 = 0. The indicial polynomial is /(v) = v(v+y—1), 
whose roots are vj = 0 and v2 = 1— y. Unless y is an integer, we have 
two formal solutions. 


It is shown in differential equation theory [Birk 78, pp. 40-242] that as 
long as vj — v2 is not an integer, the series solution of Theorem 15.2.3 is 
convergent for a neighborhood of zo. What happens when 1; — v2 is an 
integer? First, as a convenience, we translate the coordinate axes so that the 
point Zz coincides with the origin. This will save us some writing, because 
instead of powers of z — zo, we will have powers of z. Next we let vj = 
v2 + n with n a positive integer. Then, since it is impossible to encounter 
any new zero of the indicial polynomial beyond v,, the recursion relation, 
Eq. (15.9), will be valid for all values of n, and we obtain a solution: 


wiz) =z"! f(z) = 2"! (: + a cx), 
k=1 


which is convergent in the region 0 < |z| <r for some r > 0. 
To investigate the nature and the possibility of the second solution, write 
the recursion relations of Eq. (15.9) for the smaller characteristic root v2: 
=p11(v2+1) 


oC] 
Tv. + IC, =—-O249+ b-1)Co > Cl=p1, 
I (v2 + 2)C2 = —(v2a1 + bo) Co — [C2 + Dao + b-1]C1 > Cr= 2, 


Tyg +n —1)Cn-1 = Pn-1T (2 +n2-1I)Co => Ch-1 = Pn-1, 


Tv. +n)Cy =1(V1)Cn = ~nCo => O= pp, 

(15.10) 
where in each step, we have used the result of the previous step in which 
Cx is given as a multiple of Co = 1. Here, the p’s are constants depending 
(possibly in a very complicated way) on the a;’s and bx’s. 

Theorem 15.2.3 guarantees two power series solutions only when vj — v2 
is not an integer. When vj — v2 is an integer, Eq. (15.10) shows that a nec- 
essary condition for a second power series solution to exist is that p, = 0. 
Therefore, when p, 4 0, we have to resort to other means of obtaining the 
second solution. 

Let us define the second solution as 


=w1(z) 


pee ace 
w2(z) = wi (z)A(z) =z"! f(z) A(z) (15.11) 


The coefficient of w need not have a pole of order 2. Its pole can be of order one as well. 
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and substitute in the SOLDE to obtain a FOLDE in h’, namely, 
h" + (p + 2w{/w1)h’ =0, 
or, by substituting w/w) = v1 /z+ f’/f, the equivalent FOLDE 
hU + (S++ p)i =o. (15.12) 
< fF 
Lemma 15.2.5 The coefficient of h’ in Eq. (15.12) has a residue of n + 1. 


Proof Recall that the residue of a function is the coefficient of z~! in the 
Laurent expansion of the function (about z = 0). Let us denote this residue 
for the coefficient of h’ by A_;. Since f (0) = 1, the ratio f’/f is analytic 
at z = 0. Thus, the simple pole at z = 0 comes from the other two terms. 
Substituting the Laurent expansion of p(z) gives 

2v4 2v,  a_4 

os =—+—— +a + aZ4+°:-. 

z Zz 

This shows that A_; = 2vy + a_,. On the other hand, comparing the two 
versions of the indicial polynomial 


v?+(a_1—Vvt+b_2 and (v—vy)(v— 19) =v? = (4) +9)v + VV 


gives 


vy tv2=—(a_j—1), or 2vj —-n=—(a_;—1). 


Therefore, A_} = 2v,; +a_j =n+1. 


Theorem 15.2.6 Suppose that the characteristic exponents of a SOLDE 
with a regular singular point at z = 0 are v and v2. Consider three cases: 


1. vy — v2 is not an integer. 
v2 = vy} — n where n is a nonnegative integer, and py, as defined in 
Eq. (15.10), vanishes. 

3. v2 =v, —1n where n is a nonnegative integer, and py, as defined in 
Eq. (15.10), does not vanish. 


Then, in the first two cases, there exists a basis of solutions {w,, w2} of the 
form 


CO 
wi (z) =z" (: +>) ct) i=1,2, 
k=1 


and in the third case, the basis of solutions takes the form 


[o@) Co 
wi(z) =z"! (1+5-ac!), w2(z) = o*(14 >a!) +Cw(z)Inz, 
k=1 


k=1 


where the power series are convergent in a neighborhood of z = 0. 


15.3. Fuchsian Differential Equations 


Proof The first two cases have been shown before. For the third case, we 
use Lemma 15.2.5 and write 


2, 2f' ce 
ue ee +o cez*, 
x f . k=0 


and the solution for the FOLDE in h’ will be [see Eq. (15.3) and the discus- 
sion preceding it] 


(oe) 
K@ac (: +)° nc) 
k=1 
For n = 0, i.e., when the indicial polynomial has a double root, this yields 


CO 
h'(z) =1/z+ So bez => h(z)=Inz+g1(z), 
k=1 
where g} is analytic in a neighborhood of z = 0. For n 4 0, we have h’(z) = 
aa a byz*k—"—! and, by integration, 


bx 


k—n 
Zz 
k—n 


[o,@) 
h(z)=bnlnz+ >~ 
k#én 
ioe) by 
=b,Inz+z" 2 —_ a = b,Inz+z"g2(z), 
nN 


where go is analytic in a neighborhood of z = 0. Substituting h in 
Eq. (15.11) and recalling that v2 = vy — n, we obtain the desired results 
of the theorem. 


15.3 Fuchsian Differential Equations 


In many cases of physical interest, the behavior of the solution of a 
SOLDE at infinity is important. For instance, bound state solutions of the 
Schrédinger equation describing the probability amplitudes of particles in 
quantum mechanics must tend to zero as the distance from the center of the 
binding force increases. 

We have seen that the behavior of a solution is determined by the be- 
havior of the coefficient functions. To determine the behavior at infinity, we 
substitute z = 1/t in the SOLDE 

d*w dw 
ey Poe (15.13) 


and obtain 


dv 2 1 dv 1 
+|; aro | 2+ Zsou=0. (15.14) 
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where v(t) = w(1/t), r(t) = p(/t), and s(t) = q(1/t). 

Clearly, as z > oo, t — 0. Thus, we are interested in the behavior of 
(15.14) at t = 0. We assume that both r(t) and s(t) are analytic at t = 0. 
Equation (15.14) shows, however, that the solution v(t) may still have sin- 
gularities at t = 0 because of the extra terms appearing in the coefficient 
functions. 

We assume that infinity is a regular singular point of (15.13), by which 
we mean that ¢t = 0 is a regular singular point of (15.14). Therefore, in the 
Taylor expansions of r(t) and s(t), the first (constant) term of r(t) and the 
first two terms of s(t) must be zero. Thus, we write 


fo) 
r(t)=ayt api? tees So axt*, 
k=1 
CO 
s(t) = byt? + bt? +--+ = YO byt. 
k=2 


By their definitions, these two equations imply that for p(z) and q(z), and 
for large values of |z|, we must have expressions of the form 


k=1 « 
a (15.15) 
b b b 
qa=—t a?) ok 
i —* 


When infinity is a regular singular point of Eq. (15.13), or, equiva- 
lently, when the origin is a regular singular point of (15.14), it follows 
from Theorem 15.2.6 that there exists at least one solution of the form 
v(t) = 1% (1+ S22, Cxt*) or, in terms of z, 


= CK 
=z "{1 —]. 15.1 
wi (z) =z ( > +) (15.16) 


Here a is a characteristic exponents at t = 0 of (15.14), whose indicial poly- 
nomial is easily found to be a(@ — 1) + (2—a;)a+ b2 = 0. 


Definition 15.3.1 A homogeneous differential equation with single-valued 
analytic coefficient functions is called a Fuchsian differential equation 
(FDE) if it has only regular singular points in the extended complex plane, 
i.e., the complex plane including the point at infinity. 


It turns out that a particular kind of FDE describes a large class of nonele- 
mentary functions encountered in mathematical physics. Therefore, it is in- 
structive to classify various kinds of FDEs. A fact that is used in such a 
classification is that complex functions whose only singularities in the ex- 
tended complex plane are poles are rational functions, i.e., ratios of polyno- 
mials (see Proposition 11.2.2). We thus expect FDEs to have only rational 
functions as coefficients. 


15.3. Fuchsian Differential Equations 


Consider the case where the equation has at most two regular singular 
points at z; and zz. We introduce a new variable &(z) = = . The regular 
singular points at z; and zz are mapped onto the points &; = &(z,;) = 0 and 
& = &(z2) = ov, respectively, in the extended &-plane. Equation (15.13) 


becomes 


Ye ie 0 15.17 

eet ae t (€)u = 0, (15.17) 
where u, ®, and © are functions of € obtained when z is expressed in terms 
of € in w(z), p(z), and q(z), respectively. From Eq. (15.15) and the fact that 
&€ = 0 is at most a simple pole of ®(€), we obtain ®(€) = a, /&. Similarly, 
@(&) = bo /é*. Thus, a SOFDE with two regular singular points is equiva- 
lent to the DE w” + (a; /z)w’ + (b2/z7)w = 0. Multiplying both sides by z7, 
we obtain z?w” + a,zw’ + byw = 0, which is the second-order Euler differ- 
ential equation. A general nth-order Euler differential equation is equivalent 
to a NOLDE with constant coefficients (see Problem 14.30). Thus, a second 
order Fuchsian DE (SOFDE) with two regular singular points is equivalent 
to a SOLDE with constant coefficients and produces nothing new. 

The simplest SOFDE whose solutions may include nonelementary func- 

tions is therefore one having three regular singular points, at say z;, Z2, and 
73. By the transformation 


- B= 2G — @) 
— (z — z2)(z3 — z1) 


we can map 2Z1, 22, and z3 onto | = 0, & = ow, and &3 = 1. Thus, we 
assume that the three regular singular points are at z= 0, z= 1, and z=. 
It can be shown [see Problem (15.8)] that the most general p(z) and q(z) 


Al By A2 Bo A3 
(z—1)2 z(z-—1) 


We thus have the following: 


Theorem 15.3.2 The most general second order Fuchsian DE with three 
regular singular points can be transformed into the form 


Al By A2 Bo A3 
w+ (S4 Ju’ | + Je=e 15.18 
z z-l Zz (z-1)?) = -x(z—-1) 


where A, Az, A3, B,, and Bz are constants. This equation is called the 
Riemann differential equation. 


We can write the Riemann DE in terms of pairs of characteristic expo- 
nents, (A1,A2), (441, 42), and (v1, v2), belonging to the singular points 0, 1, 
and oo, respectively. The indicial equations are easily found to be 


7 +4(A; —1)A+A2=0, 
uw? +(B, — Dut Br =0, 


v? + (1— A; — By)v + Az + Bz — A3 = 0. 
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By writing the indicial equations as (A — A1)(A — A2) = 0, and so forth and 
comparing coefficients, we can find the following relations: 


Ajy=1—-dA)-)2, A2=AjA2, 


By} =1-p1-p2, By = 112, 
Aj +t Bp =vy4+twt+1, A2+ Bp — A3 =VyV2. 


These equations lead easily to the Riemann identity 
AptAgteitbe2ty+wy2=1. (15.19) 


Substituting these results in (15.18) gives the following result. 


Theorem 15.3.3 A second order Fuchsian DE with three regular singular 
points in the extended complex plane is equivalent to the Riemann DE, 


ee. Pope 
w"+( p= 42 1p 2) a! 
Z z—-l1 


AA —)jA2 -— 
+| 1A2 12 vjv2 — AyA2 ny <0, (15.20) 


22 (z— 1)? z(z— 1) 


which is uniquely determined by the pairs of characteristic exponents at 
each singular point. The characteristic exponents satisfy the Riemann iden- 


tity, Eq. (15.19). 


The uniqueness of the Riemann DE allows us to derive identities for so- 
lutions and reduce the independent parameters of Eq. (15.20) from five to 
three. We first note that if w(z) is a solution of the Riemann DE correspond- 
ing to (A1,A2), (441, 2), and (14, v2), then the function 


v(z) =z (z— 1)4w(z) 


has branch points at z = 0, 1, co [because w(z) does]; therefore, it is a solu- 
tion of the Riemann DE. Its pairs of characteristic exponents are (see Prob- 
lem 15.10) 


QitaA,Ag+aA), Cith,m2ateh), O1-A-—pM,2-A— 4p). 
In particular, if we let A = —A, and « = —p1, then the pairs reduce to 
(O,A2—-A1), (O,2—pf1), (V1 FAL + 1, v2 +A1 + 1). 


Defining wa =vj+Ai +1, B =v2 +A1 + m1, and y=1—A2+A1, and 
using (15.19), we can write the pairs as 


(0,1—y), (0,y—a—B), (a, B), 
which yield the third version of the Riemann DE 


w'+(2 ee ee ap 


+ w=0. 
z z-l 2(z—1) 


15.4 The Hypergeometric Function 
This important equation is commonly written in the equivalent form 
z(1—z)w” +[y —(L+a+ B)z]w’ — aBw =0 (15.21) 


and is called the hypergeometric differential equation (HGDE). We will 
study this equation next. 


15.4 The Hypergeometric Function 


The two characteristic exponents of Eq. (15.21) at z= 0 are 0 and 1—y. 
It follows from Theorem 15.2.6 that there exists an analytic solution (cor- 
responding to the characteristic exponent 0) at z = 0. Let us denote this 
solution, the hypergeometric function, by F(a, 8; y; z) and write 


[o,@) 


Fa, By a= > agz* where ag = 1. 
k=0 


Substituting in the DE, we obtain the recurrence relation 


(a+k)(B +k) 


——_———-ax for k>0. 
(k+ 1)(v +k) 


ak+1 = 


These coefficients can be determined successively if y is neither zero nor a 
negative integer: 


Tiy) WF@tbhr(pt+h , 
F tytzy= ; 15.22 
SPO TST) 2 Tet Dy +h * eae 


The series in (15.22) is called the hypergeometric series, because it is the 
generalization of F(1, 8; 6; z), which is simply the geometric series. 
We note immediately from (15.22) that 


Box 15.4.1 The hypergeometric series becomes a polynomial if ei- 
ther a or B is a negative integer. 


This is because for k < |a| (or k < |f|) both '(a +k) [or [(6 + k)] 
and I'(a) [or '(B)] have poles that cancel each other. However, (a + k) 
{or '(B +k)] becomes finite for k > |a| (or k > |6|), and the pole in l(a) 
{or ’(8)] makes the denominator infinite. Therefore, all terms of the series 
(15.22) beyond k = |a| (or k = |B|) will be zero. 

Many of the properties of the hypergeometric function can be obtained 
directly from the HGDE (15.21). For instance, differentiating the HGDE 
and letting v = w’, we obtain 


z1—z)v"+[y+1—-(@+68+3)z]o' -(@+1)(6+)v=0, 
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which shows that F’(@, 8B; vy; z) =CF(a+1,B+1; y +1; z). The constant 
C can be determined by differentiating Eq. (15.22), setting z = 0 in the 
result, and noting that F(a +1, 8+ 1; y +1; 0) = 1. Then we obtain 


Fa, Biyid= Fat Lp+ ky tho. (15.23) 


Now assume that y # 1, and make the substitution w = z!~’u in the 
HGDE to obtain* 


z(1 — zu” +[v1 — @1 + Bi + Vz|u' — a Bu =O, 
where a} =a —y +1, 6; =B-—y+t+1, and y; =2 — y. Thus, 
u=Fa-—y+1,bB-y+1,2-y;2), 


and u is therefore analytic at z = 0. This leads to an interesting result. Pro- 
vided that y is not an integer, the two functions 


wiZ)=FO,By32), woz) =z" YF@-ytl,p—-ytl2-732) 

(15.24) 
form a canonical basis of solutions to the HGDE at z = 0. This follows 
from Theorem 15.2.6 and the fact that (0,1 — y) are a pair of (different) 
characteristic exponents at z = 0. 


Historical Notes 

Johann Carl Friedrich Gauss (1777-1855) was the greatest of all mathematicians and 
perhaps the most richly gifted genius of whom there is any record. He was born in the 
city of Brunswick in northern Germany. His exceptional skill with numbers was clear at a 
very early age, and in later life he joked that he knew how to count before he could talk. It 
is said that Goethe wrote and directed little plays for a puppet theater when he was 6 and 
that Mozart composed his first childish minuets when he was 5, but Gauss corrected an 
error in his father’s payroll accounts at the age of 3. At the age of seven, when he started 
elementary school, his teacher was amazed when Gauss summed the integers from 1 to 
100 instantly by spotting that the sum was SO pairs of numbers each pair summing to 101. 
His long professional life is so filled with accomplishments that it is impossible to give 
a full account of them in the short space available here. All we can do is simply give a 
chronology of his almost uncountable discoveries. 


1792-1794: Gauss reads the works of Newton, Euler, and Lagrange; discovers the 
prime number theorem (at the age of 14 or 15); invents the method of 
least squares; conceives the Gaussian law of distribution in the theory of 
probability. 

1795: (only 18 years old!) Proves that a regular polygon with n sides is con- 
structible (by ruler and compass) if and only if n is the product of a power 
of 2 and distinct prime numbers of the form p; = 2 4 1, and completely 
solves the 2000-year old problem of ruler-and-compass construction of 
regular polygons. He also discovers the law of quadratic reciprocity. 


1799: Proves the fundamental theorem of algebra in his doctoral dissertation 
using the then-mysterious complex numbers with complete confidence. 
1801: Gauss publishes his Disquisitiones Arithmeticae in which he creates the 


modern rigorous approach to mathematics; predicts the exact location of 
the asteroid Ceres. 


3In the following discussion, a1, 6, and y; will represent the parameters of the new DE 
satisfied by the new function defined in terms of the old. 


15.4 The Hypergeometric Function 


1807: Becomes professor of astronomy and the director of the new observatory 
at Gottingen. 
1809: Publishes his second book, Theoria motus corporum coelestium, a major 


two-volume treatise on the motion of celestial bodies and the bible of 
planetary astronomers for the next 100 years. 

1812: Publishes Disquisitiones generales circa seriem infinitam, a rigorous 
treatment of infinite series, and introduces the hypergeometric function 
for the first time, for which he uses the notation F(a, 6B; y; z); an essay 
on approximate integration. 

1820-1830: Publishes over 70 papers, including Disquisitiones generales circa super- 
ficies curvas, in which he creates the intrinsic differential geometry of 
general curved surfaces, the forerunner of Riemannian geometry and the 
general theory of relativity. From the 1830s on, Gauss was increasingly 
occupied with physics, and he enriched every branch of the subject he 
touched. In the theory of surface tension, he developed the fundamen- 
tal idea of conservation of energy and solved the earliest problem in the 
calculus of variations. In optics, he introduced the concept of the focal 
length of a system of lenses. He virtually created the science of geomag- 
netism, and in collaboration with his friend and colleague Wilhelm Weber 
he invented the electromagnetic telegraph. In 1839 Gauss published his 
fundamental paper on the general theory of inverse square forces, which 
established potential theory as a coherent branch of mathematics and in 
which he established the divergence theorem. 


Gauss had many opportunities to leave Gottingen, but he refused all offers and remained 
there for the rest of his life, living quietly and simply, traveling rarely, and working with 
immense energy on a wide variety of problems in mathematics and its applications. Apart 
from science and his family—he married twice and had six children, two of whom em- 
igrated to America—his main interests were history and world literature, international 
politics, and public finance. He owned a large library of about 6000 volumes in many lan- 
guages, including Greek, Latin, English, French, Russian, Danish, and of course German. 
His acuteness in handling his own financial affairs is shown by the fact that although he 
started with virtually nothing, he left an estate over a hundred times as great as his average 
annual income during the last half of his life. 

The foregoing list is the published portion of Gauss’s total achievement; the unpublished 
and private part is almost equally impressive. His scientific diary, a little booklet of 19 
pages, discovered in 1898, extends from 1796 to 1814 and consists of 146 very concise 
statements of the results of his investigations, which often occupied him for weeks or 
months. These ideas were so abundant and so frequent that he physically did not have 
time to publish them. Some of the ideas recorded in this diary: 


1. Cauchy Integral Formula: Gauss discovers it in 1811, 16 years before Cauchy. 
Non-Euclidean Geometry: After failing to prove Euclid’s fifth postulate at the age 
of 15, Gauss came to the conclusion that the Euclidean form of geometry cannot be 
the only one possible. 

3. Elliptic Functions: Gauss had found many of the results of Abel and Jacobi (the 
two main contributors to the subject) before these men were born. The facts became 
known partly through Jacobi himself. His attention was caught by a cryptic passage 
in the Disquisitiones, whose meaning can only be understood if one knows some- 
thing about elliptic functions. He visited Gauss on several occasions to verify his 
suspicions and tell him about his own most recent discoveries, and each time Gauss 
pulled 30-year-old manuscripts out of his desk and showed Jacobi what Jacobi had 
just shown him. After a week’s visit with Gauss in 1840, Jacobi wrote to his brother, 
“Mathematics would be in a very different position if practical astronomy had not 
diverted this colossal genius from his glorious career.” 


A possible explanation for not publishing such important ideas is suggested by his com- 
ments in a letter to Bolyai: “It is not knowledge but the act of learning, not possession but 
the act of getting there, which grants the greatest enjoyment. When I have clarified and 
exhausted a subject, then I turn away from it in order to go into darkness again.” His was 
the temperament of an explorer who is reluctant to take the time to write an account of his 


475 


476 


15 Complex Analysis of SOLDEs 


last expedition when he could be starting another. As it was, Gauss wrote a great deal, but 
to have published every fundamental discovery he made in a form satisfactory to himself 
would have required several long lifetimes. 


A third relation can be obtained by making the substitution w = (1 — 
z)’—¢-By. This leads to a hypergeometric equation for u with a; = y — a, 
Bi = y — 8B, and y; = y. Furthermore, w is analytic at z= 0, and w(0) = 1. 
We conclude that w = F(a, B; y; z). We therefore have the identity 


F(a, B; 732) =(1— 2)" * # Fly — a, y — B; y; 2). (15.25) 


To obtain the canonical basis at z = 1, we make the substitution f = 1 — z, 
and note that the result is again the HGDE, with a; =a, 6; = 6, and yj = 
a+B—y +1. It follows from Eq. (15.24) that 


w3(z)= F(a, B;a+B-y+1;,1—2), 


(15.26) 
wa(z) = (1 —z)"-* PF(y — B,y —a;y —a—B+1;1-2) 


form a canonical basis of solutions to the HGDE at z = 1. 
A symmetry of the hypergeometric function that is easily obtained from 
the HGDE is 


F(a, B; y;z) = F(B, a; y; Zz). (15.27) 


The six functions 


F@tl,By;z), F@,Bel;y;z), Fla,p;y+1;z) 


are called hypergeometric functions contiguous to F(a, 6; y; z). The dis- 
cussion above showed how to obtain the basis of solutions at z = 1 from 
the regular solution to the HDE z = 0, F(a, 6; y; z). We can show that the 
basis of solutions at z = oo can also be obtained from the hypergeometric 
function. 

Equation (15.16) suggests a function of the form 


ve) = 2" F(a, Brin 7) =7'v(<) > we) =20(2), (15.28) 


where r, a1, 61, and jy; are to be determined. Since w(z) is a solution of the 
HGDE, v will satisfy the following DE (see Problem 15.15): 


z(l—z)v"+[l-a—f-2r—-(2-y —2r)z]v' 


[7 r+ry or ante +p) v=o. (15.29) 


This reduces to the HGDE if r = —a orr = —f. For r = —a, the parameters 
become aj =a, 6} =1+a-—y, and y, =a —6+1. For r = —f, the 
parameters are a; = 6, 8B} = 1+ 8—y,andy, = 6B—a+1. Thus, 


1 
n@Q=e°F (a1 +a yia—B+1, ). 
(15.30) 


n@=cPF(p1+p y;B-at+l; 7) 


15.4 The Hypergeometric Function 


form a canonical basis of solutions for the HGDE that are valid about 
Z=OO. 

As the preceding discussion suggests, it is possible to obtain many rela- 
tions among the hypergeometric functions with different parameters and in- 
dependent variables. In fact, the nineteenth-century mathematician Kummer 
showed that there are 24 different (but linearly dependent, of course) solu- 
tions to the HGDE. These are collectively known as Kummer’s solutions, 
and six of them were derived above. Another important relation (shown in 
Problem 15.16) is that 


1 
err Fy a,l—a;l—a+f: ) (15.31) 
z 


also solves the HGDE. 

Many of the functions that occur in mathematical physics are related to 
the hypergeometric function. Even some of the common elementary func- 
tions can be expressed in terms of the hypergeometric function with appro- 
priate parameters. For example, when 6 = y, we obtain 


Cc 
T@t+k) x, = 
F(a, B; B: 2) =) ————_# = (1-2). 
(a, B; B; z) La r@rk+) (1—z) 
Similarly, 
ge ee ae ae In(1 +z) 
~,-:2;77)= d F(1, 1; 2; -z) = ———. 
(5 ins ) a (i =z) = 


However, the real power of the hypergeometric function is that it encom- 
passes almost all of the nonelementary functions encountered in physics. 
Let us look briefly at a few of these. 

Jacobi functions are solutions of the DE 


du du 
(1 x5 +[B-a-(@+B +2)x]— 
+A(A+a+6+1u=0. (15.32) 


Defining x = 1 — 2z changes this DE into the HGDE with parameters a; = 
A, Bi =A+a+ 641, and y) = 1+ a. The solutions of Eq. (15.32), called 
the Jacobi functions of the first kind, are, with appropriate normalization, 


TAt+at+1) 
PA+D)r@t+)) 


1l-z 
PP) (2) = F( AAta+B+1; 14a; — . 


When A = n, a nonnegative integer, the Jacobi function turns into a poly- 
nomial of degree n with the following expansion: 


Tiant+a+t1) 


(@.B) (5) — 
Pn TGs Drees ee 


n 


xy Panter Pee e (eo ty 
Tat+k+l1) 2 , 


k=0 
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These are the Jacobi polynomials discussed in Chap. 8. In fact, the DE satis- 
fied by P\-” (x) of Chap. 8 is identical to Eq. (15.32). Note that the trans- 
formation x = 1 — 2z translates the points z = 0 and z = | to the points 
x = 1 and x = —1, respectively. Thus the regular singular points of the Ja- 
cobi functions of the first kind are at +1 and oo. 

A second, linearly independent, solution of Eq. (15.32) is obtained by 
using (15.31). These are called the Jacobi functions of the second kind: 


(0B) () — otPrAtoa+Drat+pt+) 
= Far Fat P+ DE-DE + DP 


2 
KP(Atat Ab tat p+? —). (15.33) 
—2Z 


Gegenbauer functions, or ultraspherical functions, are special cases of 
Jacobi functions for which aw = B = pp — 7 They are defined by 


CH) = P(A + 2p) F( 


1 l-z 
A,A+2pu; —; —— |}. 15.34 
Pa+Dr@p) ape ) eee) 


2 
Note the change in the normalization constant. Linearly independent Gegen- 
bauer functions “of the second kind” can be obtained from the Jacobi func- 
tions of the second kind by the substitution a = 6B = uw — 5. 

Another special case of the Jacobi functions is obtained when a = 6 = 
0. Those obtained from the Jacobi functions of the first kind are called 
Legendre functions of the first kind: 


= 
PA) = BOM) =O)" = F224 1:1; =). (15.35) 


Legendre functions of the second kind are obtained from the Jacobi func- 
tions of the second kind in a similar way: 


22a +1) 
(2A +2)(z —1)*4+1 


2 
QZ)= Fat A+ M427), 


l= 
Other functions derived from the Jacobi functions are obtained similarly (see 
Chap. 8). 


15.5 Confluent Hypergeometric Functions 


The transformation x = | — 2z translates the regular singular points of the 
HGDE by a finite amount. Consequently, the new functions still have two 
regular singular points, z = +1, in the complex plane. In some physical 
cases of importance, only the origin, corresponding to r = 0 in spherical 
coordinates (typically the location of the source of a central force), is the 
singular point. If we want to obtain a differential equation consistent with 
such a case, we have to “push” the singular point z = | to infinity. This can 


15.5 Confluent Hypergeometric Functions 


be achieved by making the substitution t = rz in the HGDE and taking the 
limit r — oo. The substitution yields 


=0. (15.36) 


dt = de 1¢—-y 


a+(2 eye ap 
t t—r 


If we blindly take the limit r — oo with a, f, and y remaining finite, 
Eq. (15.36) reduces to w + (y/t)w = 0, an elementary FODE in w. To 
obtain a nonelementary DE, we need to manipulate the parameters, to let 
some of them tend to infinity. We want y to remain finite, because other- 
wise the coefficient of dw/dt will blow up. We therefore let 6 or a tend 
to infinity. The result will be the same either way because a and 6 appear 
symmetrically in the equation. It is customary to let 8 =r — oo. In that 
case, Eq. (15.36) becomes 


d*w y dw a 
1 =0. 
dt? ( t ) dt ot. 


Multiplying by ¢ and changing the independent variable back to z yields 


zw’ (z) + (vy — z)w'(z) — aw(z) = 0. (15.37) 


This is called the confluent hypergeometric DE (CHGDE). 

Since z = 0 is still a regular singular point of the CHGDE, we can obtain 
expansions about that point. The characteristic exponents are 0 and | — y, 
as before. Thus, there is an analytic solution (corresponding to the charac- 
teristic exponent 0) to the CHGDE at the origin, which is called the con- 
fluent hypergeometric function and denoted by (a; y; z). Since z = 0 is 
the only possible (finite) singularity of the CHGDE, ®(q; y; z) is an entire 
function. 

We can obtain the series expansion of ®(q@; y; z) directly from Eq. (15.22) 
and the fact that (a; y; z) = limg_-0 F(a, B; y; z/B). The result is 


— . FOS Ta@th 4 
Pa yi2= TH Tet Dry +H* (15.38) 


This is called the confluent hypergeometric series. An argument similar to 
the one given in the case of the hypergeometric function shows that 


Box 15.5.1 The confluent hypergeometric function ®(a; y;z) re- 
duces to a polynomial when a is a negative integer. 


A second solution of the CHGDE can be obtained, as for the HGDE. If 
1 — y is not an integer, then by taking the limit 8 — oo of Eq. (15.24), we 
obtain the second solution z-’ O(a —y+1,2-—y;z). Thus, 


Proposition 15.5.2 Any solution of the CHGDE can be written as a linear 
combination of ®(a; y; z) and z-7¥@(a— y+1,2-—y;2z). 
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15.5.1 Hydrogen-Like Atoms 


The time-independent Schr6édinger equation for a central potential, in units 
in which h = m = 1, is —5V’w + V(r)W = EW. For the case of hydrogen- 
like atoms, V(r) = —Ze* /r, where Z is the atomic number, and the equa- 
tion reduces to 


>) Ze? 
VeW+(2E+ w =0. 
r 


The radial part of this equation is given by Eq. (13.14) with f(r) =2E + 
2Ze*/r. Defining u =r R(r), we may write 


a GW) Wr (15.39) 
dr2 r re aii , 


where A = 2E, a = 2Ze”, and b =1(1 + 1). This equation can be further 
simplified by defining r = kz (k is an arbitrary constant to be determined 


later): 

d*u ak b 

— +e = Jv 0. 
de’ ( ae a) 

Choosing Ak? = -i and introducing a =a/(2./—A) yields 


<3 +( le 5) ‘ 
u=v. 
4 z. 2 


Equations of this form can be transformed into the CHGDE by making 
the substitution u(z) = z“e~"* f(z). It then follows that 


2 = 
—=+(* wv) Z| 1 we 1) 2uv oF 


dz? z dz 


D4 
+v = 0. 
4 2 z z 2 lr 


Choosing v? = j and «(4 — 1) = b reduces this equation to 


f" as (+ 2) 2uv a f =§. 


z z 


which is in the form of (15.37). 

On physical grounds, we expect u(z) > 0 as z > oo.‘ Therefore, v = 7 
Similarly, with uw(u — 1) = b =/1( + 1), we obtain the two possibilities 
fu = —l and xz =/1+ 1. Again on physical grounds, we demand that u(0) 
be finite (the wave function must not blow up at r = 0). This implies? that 
f=1-+ 1. We thus obtain 


pt [2 iy Say 


4This is because the volume integral of |W |? over all space must be finite. The radial part 
of this integral is simply the integral of r?R?(r) = u?(r). This latter integral will not be 
finite unless u(co) = 0. 


5Recall that jz is the exponent of z=r/k. 


15.5 Confluent Hypergeometric Functions 
Multiplying by z gives 
zf” +[2d +1) —z]f’-( +1-a)f =0. 


Comparing this with Eq. (15.37) shows that f is proportional to @(/ + 1 — 
a, 21 + 2; z). Thus, the solution of (15.39) can be written as 


u(z) =Czte-2/2@ 0 4+. 1 — a, 21 +2; 2). 


An argument similar to that used in Problem 14.22 will reveal that the 
product e~*/*@(1 + 1 — a, 2/ + 2; z) will be infinite unless the power se- 
ries representing ® terminates (becomes a polynomial). It follows from 
Box 15.5.1 that this will take place if 


i+d-el-N (15.40) 


for some integer N > 0. In that case we obtain the Laguerre polynomials 


j. TN+ {4D 
YY Tw+ Dry + 


@(-N,j+1;z), where j =2/+1. 


Condition (15.40) is the quantization rule for the energy levels of a 
hydrogen-like atom. Writing everything in terms of the original parame- 
ters and defining n = N + 1+ 1 yields—after restoring all the m’s and the 
h’s—the energy levels of a hydrogen-like atom: 


pa_Zme _ me* gas 
2h2n2 2 n2’ 


where a = e*/(fic) = 1/137 is the fine-structure constant. 
The radial wave functions can now be written as 


2Z 
Ry (r) = ms) = crete (<n +14+1,214+2; —), 


where 
aj = h? /(me’) = 0.529 x 1078 cm 


is the Bohr radius. 


Historical Notes 

Friedrich Wilhelm Bessel (1784-1846) showed no signs of unusual academic ability in 
school, although he did show a liking for mathematics and physics. He left school intend- 
ing to become a merchant’s apprentice, a desire that soon materialized with a seven-year 
unpaid apprenticeship with a large mercantile firm in Bremen. The young Bessel proved 
so adept at accounting and calculation that he was granted a small salary, with raises, 
after only the first year. An interest in foreign trade led Bessel to study geography and 
languages at night, astonishingly learning to read and write English in only three months. 
He also studied navigation in order to qualify as a cargo officer aboard ship, but his in- 
nate curiosity soon compelled him to investigate astronomy at a more fundamental level. 
Still serving his apprenticeship, Bessel learned to observe the positions of stars with suf- 
ficient accuracy to determine the longitude of Bremen, checking his results against pro- 
fessional astronomical journals. He then tackled the more formidable problem of deter- 
mining the orbit of Halley’s comet from published observations. After seeing the close 
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agreement between Bessel’s calculations and those of Halley, the German astronomer OI- 
bers encouraged Bessel to improve his already impressive work with more observations. 
The improved calculations, an achievement tantamount to a modern doctoral dissertation, 
were published with Olbers’s recommendation. Bessel later received appointments with 
increasing authority at observatories near Bremen and in KGnigsberg, the latter position 
being accompanied by a professorship. (The title of doctor, required for the professorship, 
was granted by the University of Géttingen on the recommendation of Gauss.) 

Bessel proved himself an excellent observational astronomer. His careful measurements 
coupled with his mathematical aptitude allowed him to produce accurate positions for a 
number of previously mapped stars, taking account of instrumental effects, atmospheric 
refraction, and the position and motion of the observation site. In 1820 he determined the 
position of the vernal equinox accurate to 0.01 second, in agreement with modern values. 
His observation of the variation of the proper motion of the stars Sirius and Procyon led 
him to posit the existence of nearby, large, low-luminosity stars called dark companions. 
Between 1821 and 1833 he catalogued the positions of about 75,000 stars, publishing his 
measurements in detail. One of his most important contributions to astronomy was the 
determination of the distance to a star using parallax. This method uses triangulation, or 
the determination of the apparent positions of a distant object viewed from two points a 
known distance apart, in this case two diametrically opposed points of the Earth’s orbit. 
The angle subtended by the baseline of Earth’s orbit, viewed from the star’s perspective, 
is known as the star’s parallax. Before Bessel’s measurement, stars were assumed to be 
so distant that their parallaxes were too small to measure, and it was further assumed 
that bright stars (thought to be nearer) would have the largest parallax. Bessel correctly 
reasoned that stars with large proper motions were more likely to be nearby ones and 
selected such a star, 61 Cygni, for his historic measurement. His measured parallax for 
that star differs by less than 8 % from the currently accepted value. 

Given such an impressive record in astronomy, it seems only fitting that the famous func- 
tions that bear Bessel’s name grew out of his investigations of perturbations in planetary 
systems. He showed that such perturbations could be divided into two effects and treated 
separately: the obvious direct attraction due to the perturbing planet and an indirect ef- 
fect caused by the sun’s response to the perturber’s force. The so-called Bessel functions 
then appear as coefficients in the series treatment of the indirect perturbation. Although 
special cases of Bessel functions were discovered by Bernoulli, Euler, and Lagrange, the 
systematic treatment by Bessel clearly established his preeminence, a fitting tribute to the 
creator of the most famous functions in mathematical physics. 


15.5.2 Bessel Functions 


The Bessel differential equation is usually written as 


” 1 / v? 
w+ —w'+(1—-—= }w=0. (15.41) 
Zz z 


As in the example above, the substitution w = z“e~"™ f(z) transforms 
(15.41) into 


d2 w+ d 2 2 w+ 
- a 2m) [ES n(2u aaa? 
dz dz Z Zz 


+1]r=0, 
which, if we set u = v and n =/, reduces to 


ee ee x) Qv+ Di» _9 


z z 


15.5 Confluent Hypergeometric Functions 
Making the further substitution 2iz = t, and multiplying out by f, we obtain 
d*f df 1 
t—> 2 1-t =0, 
qe + evt ee vt5 d 
which is in the form of (15.37) with a = v + 5 and y =2v+1. 
Thus, solutions of the Bessel equation, Eq. (15.41), can be written as con- 
stant multiples of z’e~~ @(v + 5 2v + 1; 2iz). With proper normalization, 
we define the Bessel function of the first kind of order v as Bessel function of the 
ey. 1 first kind 
Jy (z) = ——- [| =] e “@ =,2 1; 2iz }. 15.42 
v(Z) oan) ¢ (+; v+ iz) ( ) 
Using Eq. (15.38) and the expansion for e~'“, we can show that 
po k 2k 
Zz (-1) z 
Jy (Z) = ‘ 15.43 
v@) (3) Y areseen (3) gee) 


k=0 


The second linearly independent solution can be obtained as usual and is 
proportional to 


—s 1 
etn (Z) Holy +5 —Qv+1)+1,2-Qv+1); diz) 


eos 1 
= c(5) o(—v 4 a —2v+1; diz) =CJ_,(z), 


provided that 1 — y = | — (2v+ 1) = —2v is not an integer. When v is 
an integer, J_»(z) = (—1)" Jn(z) (see Problem 15.25). Thus, when v is a 
noninteger, the most general solution is of the form AJ,,(z) + BJ_,(z). 

How do we find a second linearly independent solution when v is an 
integer n? We first define 


_ J, (z) cos vx — J_y(z) (15.44) 


Yu) sin vt 


called the Bessel function of the second kind, or the Neumann function. Bessel function of the 
For noninteger v this is simply a linear combination of the two linearly in- second kind, or 
dependent solutions. For integer v the function is indeterminate. Therefore, Neumann function 
we use |’ H6pital’s rule and define 


1 
Y,(z) = lim Y,(z) = — lim (-1)" 
von qw v->n| av dv 


Equation (15.43) yields 

ad z cy , Vwtkty (z\* 
= J,(z)1 1 , 

a o(5) (5) 2 YerO+k+)\2 

where W (z) = (d/dz) InT'(z). Similarly, 


= Z z\) SS wv tk+h) (z\™ 
av =-1.n(=) + (5) a vere, 
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Substituting these expressions in the definition of Y,,(z) and using J_,(z) = 
(—1)"Jn(z), we obtain 


_ 2 Z iv eae pYatk+y (z\" 
¥n(2)= Fn In( 5) a6) >, 7 eres) 


(2 noe. 2g RED fey" 


The natural log term is indicative of the solution suggested by Theo- 
rem 15.2.6. Since Y,,(z) is linearly independent of J, (z) for any v, integer or 
noninteger, it is convenient to consider {J,,(z), Y,(z)} as a basis of solutions 
for the Bessel equation. 

Another basis of solutions is defined as 


HY QZ =A@+Y(@, AZ =S()-i¥,(@), (15.46) 


which are called Bessel functions of the third kind, or Hankel functions. 
Replacing z by iz in the Bessel equation yields 


dw, idw ie 0 
w=0, 
dz2 z dz 2 


whose basis of solutions consists of multiples of J, (iz) and J_, (iz). Thus, 
the modified Bessel functions of the first kind are defined as 


inv : z <= | : . 
I,(z) =e Pai = (3) ae) : 


k=0 


Similarly, the modified Bessel functions of the second kind are defined as 


Ky(z) = [Zv() — L)]- 


2 sin vit 


When »v is an integer, I, = J_,, and K,, is indeterminate. Thus, we define 
Ky, (z) as limy_.n Ky (z). This gives 


(-1)” . ol_ al, 
Ky (z) = 5) jim a ; 


which has the power-series representation 
1 2V's PHtke Dl fz)" 
Kn(z) = (-1)"*1 (2) In( 1)" 
n(Z) = (-1) nevin( 5 ale ra tk+) 5 
1 ey eo Vesa 
+-(-D"(=) > ; 
2 2 kKIV(k-—n+1)\2 


k=0 
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We can obtain a recurrence relation for solutions of the Bessel equation recurrence relation for 
as follows. If Z,,(z) is a solution of order v, then (see Problem 15.28) solutions of the Bessel 
d d ee 

Zya1 = Cyz" ale 20 (| and Zy4=Coz” qe 2 (z)]. 


If the constants are chosen in such a way that Z,, Z_,, Z)41, and Z,_, sat- 
isfy their appropriate series expansions, then Cj = —1 and C2 = 1. Carrying 
out the differentiation in the equations for Z,4 1 and Z,_1, we obtain 


v dZ v dZ 
Zva=—Zy-—, Zp = -Zy + —. (15.47) 
z dz z dz 
Adding these two equations yields the recursion relation 
2v 
Zy—1(Z) + Zygi(Z) = ee (15.48) 


where Z,,(z) can be any of the three kinds of Bessel functions. 


15.6 Problems 


15.1 Show that the solution of w’ + w/z? = 0 has an essential singularity 
atz=0. 


15.2 Derive the recursion relation of Eq. (15.7) and express it in terms of 
the indicial polynomial, as in Eq. (15.9). 


15.3 Find the characteristic exponent associated with the solution of 
w” + p(z)w' + q(z)w =0 


at an ordinary point [a point at which p(z) and qg(z) have no poles]. How 
many solutions can you find? 


15.4 The Laplace equation in electrostatics when separated in spherical co- 
ordinates yields a DE in the radial coordinate given by 


d (5d 
(22) nate Dy=0 forn>0. 
dx dx ° 


Starting with an infinite series of the form (15.6), show that the two inde- 
pendent solutions of this ODE are of the form x” and x~"~!. 


15.5 Find the indicial polynomial, characteristic exponents, and recursion 
relation at both of the regular singular points of the Legendre equation, 


What is ax, the coefficient of the Laurent expansion, for the point z = +1? 
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15.6 Show that the substitution z = 1/t transforms Eq. (15.13) into 
Eq. (15.14). 


15.7 Obtain the indicial polynomial of Eq. (15.14) for expansion about 
t=0. 


15.8 Show that Riemann DE represents the most general second order 
Fuchsian DE. 


15.9 Derive the indicial equation for the Riemann DE. 


15.10 Show that the transformation v(z) = z*(z — 1)#w(z) changes the 
pairs of characteristic exponents (Aj, A2), (441, 42), and (V1, v2) for the Rie- 
mann DE to (Ai +A, A2+A), (Mi tm, Ho2+ pm), and (vy) -A— pL, v2-A— pL). 


15.11 Go through the steps leading to Eqs. (15.24), (15.25), and (15.26). 


15.12 Show that the elliptic function of the first kind, defined as 


ae do 
K(z) = ————_——; 
0 V1—2z2sin26 
can be expressed as (1/2) F (5, 5 1; 2’). 


15.13 By differentiating the hypergeometric series, show that 


a" yy l@tmretnary) , 
ga a= T@r@ro +n) Fat+n,B+n;y +n; z). 


15.14 Use direct substitution in the hypergeometric series to show that 


F(a, B; B; -z) = (14+ 2)”, aG =; sia) = >in" Z, 


1 
F(1, 1; 2; —z) = —In(1 +z). 
Zz 


15.15 Show that the substitution v(z) = z” w(1/z) [see Eq. (15.28)] trans- 
forms the HGDE into Eq. (15.29). 


15.16 Consider the function v(z) = z’(1 — z)° F(a@1, 61; v1; 1/z) and as- 
sume that it is a solution of HGDE. Find a relation among 7, s, a1, 61, and 
y, such that v(z) is written in terms of three parameters rather than five. In 
particular, show that one possibility is 


ia “ag PY a= palo: 
Find all such possibilities. 


15.17 Show that the Jacobi functions are related to the hypergeometric 
functions. 


15.6 Problems 


15.18 Derive the expression for the Jacobi function of the second kind as 
given in Eq. (15.33). 


15.19 Show that z = oo is not a regular singular point of the CHGDE. 


15.20 Derive the confluent hypergeometric series from hypergeometric se- 
ries. 


15.21 Show that the Weber-Hermite equation, w” + (v + 4 — 42?)u=0 


can be transformed into the CHGDE. Hint: Make the substitution u(z) = 
exp —427)v(z). 


15.22 The linear combination 
rd - 
Pa@=y+1) 
ry 


-)iy, 
S| big as ay 
+ T@) Zz (a-y+l, V3Z) 


P(a, y; Z) 


is also a solution of the CHGDE. Show that the Hermite polynomials can be 


written as 
Zz nl 2 
An| —= )=2"W(-=, =; — }. 
() ( z2 r) 


15.23 Verify that the error function erf(z) = i: e~ dt satisfies the relation 
erf(z) = z®(5, 3; -2”). 


15.24 Derive the series expansion of the Bessel function of the first kind 
from that of the confluent hypergeometric series and the expansion of the 
exponential. Check your answer by obtaining the same result by substituting 
the power series directly in the Bessel DE. 


15.25 Show that J_»(z) = (—1)" Jn(z). Hint: Let v = —n in the expansion 
of J,(z) and use '\(m) = oo for a nonpositive integer m. 


15.26 In a potential-free region, the radial part of the Schrodinger equation 


reduces to 

d*R  2dR a 

dr? i r dr = E al = 
Write the solutions of this DE in terms of Bessel functions. Hint: Substitute 
R=u/,/r. These solutions are called spherical Bessel functions. 


15.27 Theorem 15.2.6 states that under certain conditions, linearly indepen- 
dent solutions of SOLDE at regular singular points exist even though the dif- 
ference between the characteristic exponents is an integer. An example is the 
case of Bessel functions of half-odd-integer orders. Evaluate the Wronskian 
of the two linearly independent solutions, J;, and J_,, of the Bessel equation 


487 


Weber-Hermite equation 
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and show that it vanishes only if v is an integer. This shows, in particular, 
that Jn41/2 and J_y—1/2 are linearly independent. Hint: Consider the value 
of the Wronskian at z = 0, and use the formula [(v) (1 — v) = z7/sinvz. 


15.28 Show that z*’(d/dz)[z*” Z,,(z)] is a solution of the Bessel equation 
of order v + 1 if Z,, is a solution of order v. 


15.29 Use the recursion relation of Eq. (15.47) to prove that 


ld m 
( =) [z”Zy(z)] = 2)" Zy—m(2), 
(==) lz 2utz) | = (aye Zc @). 


15.30 Using the series expansion of the Bessel function, write J) /2(z) and 
J_1/2(z) in terms of elementary functions. Hint: First show that 


r(k+ >) = fm (2k + 1)!/(K127*+1). 


15.31 From the results of the previous two problems, derive the relations 
2 nti{id ” (cosz 
J_n-1/2(2) =f =2 = , 
u zdz Zz 
2 1d\"/sinz 
Ini poz) = yf <2? (- ) (=). 
a zdz Zz 


15.32 Obtain the following integral identities: 


= 


(a) lie Jy(zydz=2"t! Kyai(2). 


(b) [or n@da-O Le, 


( 


Q 


) [arse dz= 241 Jyii(z) + (u— vz" Jy (2) 


— (ue? -v’) / ge" Saas 
and evaluate 
@ f Pda 
Hint: For (c) write 4+! = z#~¥z"+! and use integration by parts. 
15.33 Use Theorem 15.2.6 and the fact that J,,(z) is entire to show that for 


integer n, a second solution to the Bessel equation exists and can be written 
as Yn(z) = Jn(2Z) fn (Zz) + Cn Inz], where f(z) is analytic about z = 0. 


15.6 Problems 


15.34 (a) Show that the Wronskian W(J,, Z; z) of J, and any other solu- 
tion Z of the Bessel equation, satisfies the equation 


d 

qew Ziz)|=0. 
(b) For some constant A, show that 

d =| _ We) A 
dz|Jy| J2@) 2J2@) 


(c) Show that the general second solution of the Bessel equation can be 


written as 
dz 
v = Jy A ‘ 
ane Wei] B+ / aol 


15.35 Spherical Bessel functions are defined by 


_ [8 ( Zi41/2@) 
nom fie} 


Let fj(z) denote a spherical Bessel function “of some kind.” By direct dif- 
ferentiation and substitution in the Bessel equation, show that 


: ler fo lez fa: 


dz 


d 
(b) qe fi] =-2z fi4i(2). 


(a) 


(c) Combine the results of parts (a) and (b) to derive the recursion relations 


21 


1 
= fi(2), 
z 


fin@+ fir) = 


d 


15.36 Show that 


2 
WA Ys a=, w(A®, A; z) = —. 
MZ 1Z 
Hint: Use Problem 15.34. 


15.37 Verify the following relations: 


(a) Yn4i12(Z) = (-1)"*! n-12(2), Yip) = (-1)" Inga), 
Jy(Z) = we 
(b) ¥_y(z) =sinva Jy(z) + cosvr¥,(z) = v(Z) cos vit v2) 
sin vit 


(c) Y_n(z)=(—1)"¥,(z) in the limit v > n in part (b). 


15.38 Use the recurrence relation for the Bessel function to show that 
Ji (z) = —Jo(z). 
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15.39 Let u = J, (Az) and v = J, (2z). Multiply the Bessel DE for u by v/z 
and that of v by u/z. Subtract the two equations to obtain 


02 2) , d dv yt 
— uv = u a 
ee dz : dz dz 


(a) Write the above equation in terms of J, (Az) and J, (jzz) and integrate 
both sides with respect to z. 

(b) Now divide both sides by A? — jz? and take the limit as « — A. You 
will need to use L’H6pital’s rule. 

(c) Substitute for J/’(Az) from the Bessel DE and simplify to get 


2 2 
[lnoafac= + {[waa} +(1- a 3 )[oaoy | 


(d) Finally, let A = x,,/a, where x), is the nth root of J,, and use 
Eq. (15.47) to arrive at 


a x az 
i: cia “2 )ae = > To un). 


15.40 The generating function g(z, t) for Bessel functions of integer order 
is 


g(z,t) = exp| 320 - yo}, 


To see this, rewrite g(z, ft) as e* [2972/21 expand both factors, and write the 
product as powers of t”. Now show that the coefficient of t” is simply J,(z). 


Finally, use J_,(z) = (—1)" J, (z) to derive the formula 


[ee 


exp] 52t _ 1/0| = 2 Jylot’. 


n=—CO 
15.41 Make the substitutions z = Bt” and w = ty to transform the Bessel 
DE into 


a — Ft a+ yr + (B?y71?” +a* —v?y?)u=0. 


Now show that Airy’s DE, i —tu = 0, has solutions of the form Ji3(Ziv?) 
and JipGit”). 


15.42 Show that the general solution of dw = ie w=O0is w= 
t[AJy(e!/) + BY, (e'/)]. 


15.43 Transform dw/dz + w* + z” =0 by making the substitution w = 
(d/dz)\nv. Now make the further substitutions 


204 
v=uJ/z and t= ——z!t(l/2)m 
vz m+2 
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to show that the new DE can be transformed into a Bessel equation of order 
1/(m+ 2). 


15.44 Starting with the relation 


1 1 1 
exp] 5 - 1/0| exp] 50 = 1/0| = exp] 5 + y)(t — 1/0) 


and the fact that the exponential function is the generating function for 
Jn(Z), prove the “addition theorem” for Bessel functions: 


[ee 


Inxty= Do KO)InK(). 


k=—oo 


Integral Transforms and Differential 1 6 


Equations 


The discussion in Chap. 15 introduced a general method of solving differen- 
tial equations by power series—also called the Frobenius method—which 
gives a solution that converges within a circle of convergence. In general, 
this circle of convergence may be small; however, the function represented 
by the power series can be analytically continued using methods presented 
in Chap. 12. 

This chapter, which is a bridge between differential equations and opera- 
tors on Hilbert spaces (to be developed in the next part), introduces another 
method of solving DEs, which uses integral transforms and incorporates 
the analytic continuation automatically. The integral transform of a function 
v is another function u given by 


uc) = | K(z, t)u(t) dt, (16.1) 
Cc 


where C is a convenient contour, and K (z, f), called the kernel of the inte- 
gral transform, is an appropriate function of two complex variables. 


Example 16.0.1 Let us consider some examples of integral transforms. 


(a) The Fourier transform is familiar from the discussion of Chap. 9. 
The kernel is 


K(x, y) =e”, 


(b) The Laplace transform is used frequently in electrical engineering. 
Its kernel is 


K(x,y)=e. 
(c) The Euler transform has the kernel 
K(x,y)=(x—y)’. 
(d) The Mellin transform has the kernel 
K(x, y)= G(x”), 


where G is an arbitrary function. Most of the time K (x, y) is taken to 
be simply x”. 
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16 Integral Transforms and Differential Equations 
(e) The Hankel transform has the kernel 
K (x,y) =yJn(xy), 


where J;, is the nth-order Bessel function. 
(f) A transform that is useful in connection with the Bessel equation has 


the kernel 
v 
K(x,y)= (5) er /4y, 


The idea behind using integral transform is to write the solution u(z) 
of a DE in z in terms of an integral such as Eq. (16.1) and choose v and 
the kernel in such a way as to render the DE more manageable. Let L, be 
a differential operator (DO) in the variable z. We want to determine u(z) 
such that L,[u] = 0, or equivalently, such that fe L.[K(z,t)]v(@) dt = 0. 
Suppose that we can find M;, a DO in the variable t, such that L,[ K (z, t)] = 
M,[K (z, t)]. Then the DE becomes te (M,[K (z, t)])u(t) dt = 0. If C has a 
and b as initial and final points (a and b may be equal), then the Lagrange 
identity [see Eq. (14.23)] yields 


b 
o=tul= [ K(z,0)M;[v(t)] dt + OIK, vl, 


a 
where Q[K, v] is the “surface term’. If v(t) and the contour C (or a and b) 
are chosen in such a way that 


OLK, v2 =0 and Mi[v(t)] =0, (16.2) 


the problem is solved. The trick is to find an M; such that Eq. (16.2) is easier 
to solve than the original equation, L,[u] = 0. This in turn demands a clever 
choice of the kernel, K (z, t). This chapter discusses how to solve some com- 
mon differential equations of mathematical physics using the general idea 
presented above. 


16.1 Integral Representation of the Hypergeometric 
Function 
Recall that for the hypergeometric function, the differential operator is 
2 


Loe" +[ ep aie 
eal — 25 y—-(a+B ae 


For such operators—whose coefficient functions are polynomials—the 
proper choice for K(z,t) is the Euler kernel, (z — ¢)*. Applying L, to 
this kernel and rearranging terms, we obtain 


L.[K(z,t)] = {z*[-s(s — 1) —s(@ +B +1) — af] +2[s(s — I) +5y 


+ st(a+ B +1) +2oBt] — yst — oBt?}(z —1)°-*. (16.3) 


16.1 Integral Representation of the Hypergeometric Function 


Note that except for a multiplicative constant, K (z, t) is symmetric in z 
and ¢. This suggests that the general form of M; may be chosen to be the 
same as that of L, except for the interchange of z and r. If we can manipu- 
late the parameters in such a way that M; becomes simple, then we have a 
chance of solving the problem. For instance, if M, has the form of L, with 
the constant term absent, then the hypergeometric DE effectively reduces to 
a FODE (in dv/dt). Let us exploit this possibility. 

The general form of the M; that we are interested in is 


en an 
pS PI gee ag 


i.e., with no po term. By applying M, to K (z, t) = (z — t)* and setting the 
result equal to the RHS of Eq. (16.3), we obtain 


s(s — 1)p2 — pisz + pist 
=2"[-s(s — 1) —s(a+B +1) -af] 
+z[s(s—D +sy +st(a+B +1) + 2aft] — yst — oft’, 
for which the coefficients of equal powers of z on both sides must be equal: 
—s(s—1)-s(a+fh+1)-af=0 > s=-a or s=-f, 
—pis=s(s—1)+syt+st(a+6+1)4+ 206, 
S(s — 1) poz + pist = —yst — apt. 


If we choose s = —a@ (s = —f leads to an equivalent representation), the co- 
efficient functions of M; will be completely determined. In fact, the second 
equation gives p,(t), and the third determines p2(t). We finally obtain 


pit)=a+l—-ytt(B-a-l), pot)=t—P, 


and 
ee ee) ae Cee eee ee (16.4) 
; dt? dt’ : 
which, according to Eq. (14.19), yields the following DE for the adjoint: 


, ad d 
M[vl = L(+ t)v] 7 ile y+1+1(B—a-—1)]v} =0. (16.5) 


The solution to this equation is v(t) = Ct®-Y’(t — 1)”~8-! (see Prob- 
lem 16.5). We also need the surface term, Q[K, v], in the Lagrange identity 
(see Problem 16.6 for details): 


QO[K, v](t) = Cat?’ tlt = LY F(z = tet, 


Finally, we need a specification of the contour. For different contours 
we will get different solutions. The contour chosen must, of course, have 
the property that Q[K, v] vanishes as a result of the integration. There are 
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two possibilities: Either the contour is closed [a = b in (16.2)] or a # b but 
Q[K, v] takes on the same value at a and at b. 

Let us consider the second of these possibilities. Clearly, Q[K, v](t) van- 
ishes at t = 1 if Re(y) > Re(). Also, as t > 00, 


OK vis) Car i a SE "Cer, 


which vanishes if Re(B) > 0. We thus take a = | and b = cv, and assume 
that Re(y) > Re(f) > 0. It then follows that 


b ee) 
u(c)= | K(.nu(nar=C' | G2) 4 Ge 1"? de 166) 
a 1 


The constant C’ can be determined to be ['(v)/[I'(B)l' (yv — B)] (see Prob- 
lem 16.7). Therefore, 


FE [o,@) 
u(z) = F@, Bry; z= Ereoa | (t—z) te Yt — 1)” Flat. 


It is customary to change the variable of integration from t to 1/t. The re- 
sulting expression is called the Euler formula for the hypergeometric func- 
tion: 


ScapeeNes ie?) , —a,p—1 y-B-1 
Fa, py; a= Ory —B) [ (1—tz) “t (1—f) dt. (16.7) 


Note that the term (1 — tz)~®% in the integral has two branch points in 
the z-plane, one at z = 1/t and the other at z = oo. Therefore, we cut the 
z-plane from z; = 1/tf, a point on the positive real axis, to z2 = oo. Since 
0 <t <1, z, is somewhere in the interval [1, co). To ensure that the cut is 
applicable for all values of t, we take z; = 1 and cut the plane along the 
positive real axis. It follows that Eq. (16.7) is well behaved as long as 


0 <arg(1 — z) < 27. (16.8) 


We could choose a different contour, which, in general, would lead to a 
different solution. The following example illustrates one such choice. 


Example 16.1.1 First note that Q[K, v] vanishes at t = 0 and t = 1 as long 
as Re(y) > Re(B) and Re(@) > Re(y) — 1. Hence, we can choose the con- 
tour to start at tf = 0 and end at t = 1. We then have 


1 
we =c" f =i"? Aer Po a 
0 


1 —a 
= ee} (: _ ‘) eV d=ap) Fai. (16.9) 
0 Zz 


To see the relation between w(z) and the hypergeometric function, expand 
(1 —t/z)~® in the integral to get 


_ nla = T@+n) a ie a+n—y y—B-1 
w(z)=C Zz 2 rare) i t (i —f) dt. 


(16.10) 


16.1 Integral Representation of the Hypergeometric Function 


Now evaluate the integral by changing ¢ to 1/t and using Eqs. (12.19) and 
(12.17). This changes the integral to 


i. pa—n-1+B = YF! at = Va@+n+1l—y)P(y —B) 
1 T@+n+1-p) ~ 


Substituting this in Eq. (16.10), we obtain 


__ et 7 aw T@tnra@tnti—-y)(1\" 
DES rege eee 

me ag! GV @+ l= ¥) 

=Tay Pet —p) 


x Fa,a-yt+la—6+1;1/z), 
where we have used the hypergeometric series of Chap. 15. Choosing 


en. Tia@+1-— 8) 
~~ T(iy-pP@t+l-y) 


yields w(z) =z “F(a,a —y+1;a—6+1;1/z), which is one of the 
solutions of the hypergeometric DE [Eq. (15.30)]. 


16.1.1 Integral Representation of the Confluent 
Hypergeometric Function 


Having obtained the integral representation of the hypergeometric function, 
we can readily get the integral representation of the confluent hypergeomet- 
ric function by taking the proper limit. It was shown in Chap. 15 that 


Pa ae Eye) 


This suggests taking the limit of Eq. (16.7). The presence of the gamma 
functions with 6 as their arguments complicates things, but on the other 
hand, the symmetry of the hypergeometric function can be utilized to our 
advantage. Thus, we may write 


®(a, y;z) = lim F(a By: 3)= lim (Bay: ;) 
Boo Boo 


B B 
1 —B 
sii ee i. (1 =) t2-1(1 — 27-81 gy 
poo T(a@)P(y —a@) Jo B 
ae ry) : zt,a—1 y-a—l 
=o | e211 — 1) dt (16.11) 


because the limit of the first term in the integral is simply e’*. Note that the 
condition Re(y) > Re(@) > 0 must still hold here. 
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Integral transforms are particularly useful in determining the asymptotic 
behavior of functions. We shall use them in deriving asymptotic formulas for 
Bessel functions later on, and Problem 16.10 derives the asymptotic formula 
for the confluent hypergeometric function. 


16.2 Integral Representation of Bessel Functions 


Choosing the kernel, the contour, and the function v(t) that lead to an in- 
tegral representation of a function is an art, and the nineteenth century pro- 
duced many masters of it. A particularly popular theme in such endeavors 
was the Bessel equation and Bessel functions. This section considers the 
integral representations of Bessel functions. 

The most effective kernel for the Bessel DE is 


When the Bessel DO 


acts on K (z, t), it yields 


v+1 22 z\" . 2 d v+1 
LK@n=(-"F +145) (3) e! ema (5 ; ) Keo. 


Thus, M; = d/dt — (v + 1)/t, and Eq. (14.19) gives 


dv v+l 
M; [p()] =-— - ——v =0, 


whose solution, including the arbitrary constant of integration k, is v(t) = 
kt~’—!. When we substitute this solution and the kernel in the surface term 
of the Lagrange identity, Eq. (14.23), we obtain 


O[K, v(t) = pK (z, t)v(t) =«(5) po¥1et-2?/48) 


A contour in the ¢-plane that ensures the vanishing of Q[K, v] for all values 
of v starts at t = —oo, comes to the origin, orbits it on an arbitrary circle, 
and finally goes back to t = —oo (see Fig. 16.1). Such a contour is possible 
because of the factor e’ in the expression for Q[K, v]. We thus can write 


v 
i= (5) i trate 21D gy, (16.12) 
Cc 


Note that the integrand has a cut along the negative real axis due to the factor 
t~’—!_ If v is an integer, the cut shrinks to a pole at t = 0. 
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Im t¢ 


Fig. 16.1 The contour C in the t-plane used in evaluating J, (z) 


The constant k must be determined in such a way that the above expres- 
sion for J,,(z) agrees with the series representation obtained in Chap. 15. It 
can be shown (see Problem 16.11) that k = 1/(277). Thus, we have 


1 Vv 
Jj(Z= = (5) i pete 21D gy, 
Cc 


It is more convenient to take the factor (z/2)” into the integral, introduce 
a new integration variable u = 2t/z, and rewrite the preceding equation as 
integral representation 


J,(z) = ie ds u-’a! exp] 5( = ~) jaw. (16.13) of Bessel function 
2ri Jo 2 u 
This result is valid as long as Re(zu) < 0 when u — —oo on the negative 
real axis; that is, Re(z) must be positive for Eq. (16.13) to work. 
An interesting result can be obtained from Eq. (16.13) when v is an inte- 
ger. In that case the only singularity will be at the origin, so the contour can 
be taken to be a circle about the origin. This yields 


1 1 
JIn(Z) = sa yet exp] 5(« _ «) [a 
G 


which is the nth coefficient of the Laurent series expansion of exp[(z/2)(u — 
1/u)] about the origin. We thus have this important result: 
Bessel generating 


exp] 5(+- +) ]- > In(zye”. Gieaay etn 


n=—CO 


The function exp[(z/2)(t — 1/t)] is therefore appropriately called the gen- 
erating function for Bessel functions of integer order (see also Prob- 
lem 15.40). Equation (16.14) can be useful in deriving relations for such 
Bessel functions as the following example shows. 


Example 16.2.1 Let us rewrite the LHS of (16.14) as e/*e7*/*', expand 
the exponentials, and collect terms to obtain 


SS tay eA z\" 
zt/2,—2/ _ Yr +f = fea (peas 
= » = (¥) val <) 


m=0 ~ n=0 
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If we let m —n =k, change the m summation to k, and note that k goes 
from —oo to oo, we get 


Zz ee) ee) {= 1)" Zz 2n+k 
ols(-7)|= > Darema) 
_ ei) z\"| 
-> |G ) eer ea TEae ) Je 


Comparing this equation with Eq. (16.14) yields the familiar expansion for 
the Bessel function: 


wo (Seo ree)" 
BON\2) 2 Patk+ DPatD\2) * 


We can also obtain a recurrence relation for J, (z). Differentiating both 
sides of Eq. (16.14) with respect to ¢ yields 


z 1 z 1 ~ n— 
(AmLsl- Havers es 


n=—Oo 


Using Eq. (16.14) on the LHS gives 


[oe 


CO [o,@) 
z n < n—2 
(G+ 5) nor = 5 dy In@t" +5 DY nr 
n=—CO n=—OCo n=—OCOoO 
v4 ci v4 = 
=5 De i +5 DE Inti", 
n=—O0O n=—0o 
(16.16) 


where we substituted n — 1 for n in the first sum and n + 1 for n in the 
second. Equating the coefficients of equal powers of t on the LHS and the 
RHS of Eqs. (16.15) and (16.16), we get 


nJn(Z) = = Ln 1(2) + Jng1(2)], 
which was obtained by a different method in Chap. 15 [see Eq. (15.48)]. 
We can start with Eq. (16.13) and obtain other integral representations 
of Bessel functions by making appropriate substitutions. For instance, we 


can let u = e” and assume that the circle of the contour C has unit radius. 
The contour C’ in the w-plane is determined as follows. Write u = re’? and 


w=x+iy, so! re’? =e*e!Y yielding r = e* and e!° = e!”. Along the first 
part of C, 6 =—z and r goes from oo to 1. Thus, along the corresponding 
part of C’, y =—z and x goes from oo to 0. On the circular part of C, 


r = 1 and @ goes from —z to +7. Thus, along the corresponding part of C’, 


'Do not confuse x and y with the real and imaginary parts of z. 
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Im w 


Fig. 16.2 The contour C’ in the w-plane used in evaluating J, (z) 


x =O and y goes from —z to +7. Finally, on the last part of C’, y = 7 and 
x goes from 0 to oo. Therefore, the contour C’ in the w-plane is as shown 
in Fig. 16.2. 

Substituting u = e” in Eq. (16.13) yields 


1 j 
Wo=s f ee ih we ayy, Re(z) > 0, (16.17) 
Ll / 


which can be transformed into (see Problem 16.12) 


sin vat 


1 us CO : 
A(@= -|/ cos(v@ — zsin@) dé — i e VF—zsinht gs (16.18) 
JO 0 


For the special case of integer v, we obtain 


1 ve 
JIn(zZ) = = / cos(né — zsin@) dé. integral representation 
T JO F 
of Bessel functions of 


‘ ‘i q 
In particular, integer order 


1 Tv 
Jo(z) = - | cos(zsin@) dé. 
JO 


We can use the integral representation for J,,(z) to find the integral rep- 
resentation for Bessel functions of other kinds. For instance, to obtain the 
integral representation for the Neumann function Y,(z), we use Eq. (15.44): 


1 
¥, (2) = (cot vt) F(z) — ——— J_v ) 


cot [* : cosvz [~% _. | 
= cos(vé — zsin@) dé — ew vi—esinht 74 
IT 0 - 4 


1 a 1 re . 
_ / cos(vé + zsin@) dé — - | evt—zsinht 74 
0 x Jo 


Sin vit 


with Re(z) > 0. Substitute 2 — @ for @ in the third integral on the RHS. Then 
insert the resulting integrals plus Eq. (16.18) in Hf (z) = Jy(z) + 1Yy(z) to 
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Im w 


Cc 
Rew 

Fig. 16.3. The contour C” in the w-plane used in evaluating HS (z) 
obtain 

lf se 1% . 

H(z) — -{ & sin6—v8) 49 ee evt—zsinht yy 
T JO 17 Jo 
eon oo . 
+ — / e vrzsinht gs, == Re(z) > 0. 
1 0 


These integrals can easily be shown to result from integrating along the 
contour C” of Fig. 16.3. Thus, we have 


1 i 
HY (z) = =f, ersinhw—rY dy Re(z) > 0. 


By changing i to —i, we can show that 


1 i 
HP @=-— | esthe—dw, — Re(z) > 0, 
cu 


where C’” is the mirror image of C” about the real axis. 


16.2.1 Asymptotic Behavior of Bessel Functions 


As mentioned before, integral representations are particularly useful for de- 
termining the asymptotic behavior of functions. For Bessel functions we 
can consider two kinds of limits. Assuming that both v and z = x are real, 
we can consider v > oo or x —> ov. First, let us consider the behavior of 
J, (x) of large order. The appropriate method for calculating the asymptotic 
form is the method of steepest descent discussed in Chap. 12 for which v 
takes the place of the large parameter a. We use Eq. (16.17) because its 
integrand is simpler than that of Eq. (16.13). The form of the integrand in 
Eq. (16.17) may want to suggest f(w) = —w and g(w) = e*™” How- 
ever, this choice does not allow setting f’(w) equal to zero. To proceed, 
therefore, we write the exponent as v(* sinh w — w), and conveniently intro- 
duce x /v = 1/cosh wo, with wo areal number, which we take to be positive. 
Substituting this in the equation above, we can read off 


sinh w 


f (w) = —— - vu, g(w)=1. 
cosh wo 
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Fig. 16.4 The contour Co in the w-plane used in evaluating J, (z) for large values of v 


The saddle point is obtained from df/dw = 0 or cosh w = cosh wo. Thus, 
w = two + 2inz, forn =0,1,2.... Since the contour C’ lies in the right 
half-plane, we choose wo as the saddle point. The second derivative f” (wo) 
is simply tanh wo, which is real, making 62 = 0, and 6; = 1/2 or 37/2. The 
convention of Chap. 12 suggests taking 6; = 2/2 (see Fig. 16.4). The rest is 
a matter of substitution. We are interested in the approximation to w up to 
the third order in t: w — wo = bit +. bot? +.b3t3. Using Eqs. (12.31), (12.37), 
and (12.38), we can easily find the three coefficients: 


J2 se WD 
= te 
| f” (wo) |!/2 /tanh wo 


fo fl!" (wo) aio, cosh? wo 
2 31F" (woe «3 sinh2 wo” 


b [Seet eel J 2631 
3 


3Lf"(wo)? (wo) J 121 f” (wo) 7 


J2 ee 
th? wo —1). 
’ 12(tanh wo)3/2 ( ere ) 


If we substitute the above in Eq. (12.36), we obtain the following asymp- 
totic formula valid for v > oo: 


eX (sinh wo— wg cosh wo) 1 5 j 
J, x 1+ 1 coth fee], 
Me) Ore ae 8x sinh wo ( 3 w) 


where v is related to wo via v = x cosh wo. 

Let us now consider the asymptotic behavior for large x. It is convenient 
to consider the Hankel functions HE (x) and H (x). The contours C” 
and C”’ involve both the positive and the negative real axis; therefore, it is 
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Fig. 16.5 The contour in the w-plane used in evaluating HY (z) in the limit of large 
values of x 


convenient, assuming that x > v, to write v = x cos B so that 


1 é 
H (x) = =| eX (sinh w—w cos B) Jay) 


The saddle points are given by the solutions to cosh w = cos f, which are 
wo = tif. Choosing wo = +i, we note that the contour along which 


Im(sinh w — wcos f) = Im(sinh wo — wo cos f) 


is given by coshu = [sin + (v — B)cos B]/sinv. This contour is shown 
in Fig. 16.5. The rest of the procedure is exactly the same as for J,(x) 
described above. In fact, to obtain the expansion for HS? (x), we simply 
replace wo by if. The result is 


H\Y (x) 


~ 2  iCesin dp) ee a 1+ > cot? p sees 
imx sin B 8ix sin B 3 


When x is much larger than v, 6 will be close to 2/2, and we have 


[2 ity 1 
HY (x) ~ mama ob =x) 


which, with 1/x — 0, is what we obtained in Example 12.5.2. 
The other saddle point, at —if, gives the other Hankel function, with the 
asymptotic limit 


H® (x) © 2 oH i(x—vm/2-n/4) 1— = 
‘: UX 8ix 


We can now use the expressions for the asymptotic forms of the two Han- 
kel functions to write the asymptotic forms of J,(x) and Y,,(x) for large x: 


Jy(x) = s[Hi) + H?(x)] 
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Y,(x) = [He — HY(x)] 


16.3 Problems 


16.1 Use the change of variables k = Int and ix = w — a (where k and x 
are the common variables used in Fourier transform equations) to show that 
the Fourier transform changes into a Mellin transform, 


1 ioo+a lee) 
G(t) = —— if F(o)t-°dw, where F(w)= / GOr ‘di 
Qui 0 


—icota 


16.2 The Laplace transform L[ f] of a function f(t) is defined as 


Lif) = | e *' f(t) dt. 


Show that the Laplace transform of 


1 
(a) fm=1 is -, where s > 0. 
Ss 
(b) f(t) =coshat is 5 ° 5 where s? > @?. 
Ss“ — Ow 
; . o 
(c) f(t) =sinhot Is > 5 where s? > @?. 
Ss“ —@w 
. Ss 
(d) f(t) = cos wt 1S ge 
(e) f(t) =sinor is 
e = sin is =——.. 
a Gap 
1 
(f) f(it)=e" fort>0, is , wheres >a. 
S-O 
. Patt 
(g) f@=t" is aa nea n>-l. 


16.3 Evaluate the integral 


(2) 


f(t) = sin wt ee 
0 


by finding the Laplace transform and changing the order of integration. Ex- 
press the result for both ¢ > 0 and rt < 0 in terms of the theta function. (You 
will need some results from Problem 16.2.) 
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16.4 Show that the Laplace transform of the derivative of a function is given 
by L[F’](s) =sL[F](s) — F(O). Similarly, show that for the second deriva- 
tive the transform is 


L[F"|(s) = s?L[F\(s) — sF(0) — F’). 


Use these results to solve the differential equation u(t) + w*u(t) = 0 sub- 
ject to the boundary conditions u(0) =a, u’(0) = 0. 


16.5 Solve the DE of Eq. (16.5). 
16.6 Calculate the surface term for the hypergeometric DE. 


16.7 Determine the constant C’ in Eq. (16.6), the solution to the hypergeo- 
metric DE. Hint: Expand (t — z)~® inside the integral, use Eqs. (12.19) and 
(12.17), and compare the ensuing series with the hypergeometric series of 
Chap. 15. 


16.8 Derive the Euler formula [Eq. (16.7)]. 


16.9 Show that 
P(r (y —a— B) 
T(y —a@(y — B) 


Hint: Use Eq. (12.19). Equation (16.19) was obtained by Gauss using only 
hypergeometric series. 


F(a, By y; I= (16.19) 


16.10 We determine the asymptotic behavior of ®(a, y; z) for z > oo in 
this problem. Break up the integral in Eq. (16.11) into two parts, one from 
0 to —oo and the other from —oo to |. Substitute —t/z for rf in the first 
integral, and | — t/z for ¢ in the second. Assuming that z — oo along the 
positive real axis, show that the second integral will dominate, and that 
ry) o-Y g2 


D(a, vy; z) > ——z 


as z7—> OO. 
I'(a) 


16.11 In this problem, we determine the constant k of Eq. (16.12). 


(a) Write the contour integral of Eq. (16.12) for each of the three pieces of 
the contour. Note that arg(t) = —z as t comes from —oo and arg(t) = 
as t goes to —oo. Obtain a real integral from 0 to oo. 
(b) Use the relation F(z) (1 — z) = 2/sinzz, obtained in Chap. 12, to 
show that 
TU 


a= P(z+1)sinzz 


(c) Expand the function exp(z? /4t) in the integral of part (a), and show 
that the contour integral reduces to 


vn ye WZ) EAH) 
—2i sin va (5) Tes) - 


n=0 


16.3 Problems 


(d) Use the result of part (c) in part (b), and compare the result with the 
series expansion of J\,(z) in Chap. 15 to arrive finally at k = 1/(277). 


16.12 By integrating along C), C2, C3, and C4 of Fig. 16.2, derive 
Eq. (16.18). 


16.13 By substituting t = exp(i@) in Eq. (16.14), show that 


love) foe) 
efzsind — Jy(z) 4.2 > Jon (z) cos(2nO) + 2i > Jon+1(z) sin[ (2n + 1)6]. 


n=) n=0 


In particular, show that 
1 2x 
Jo(z) = = | eemedo, 
20 0) 


16.14 Derive the integral representations of HY (x) and H© (x) given in 
Sect. 16.2. 
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Part V 
Operators on Hilbert Spaces 


Introductory Operator Theory 1 7 


The first two parts of the book dealt almost exclusively with algebraic tech- 
niques. The third and fourth part were devoted to analytic methods. In this 
introductory chapter, we shall try to unite these two branches of mathematics 
to gain insight into the nature of some of the important equations in physics 
and their solutions. Let us start with a familiar problem. 


17.1. From Abstract to Integral and Differential Operators 


Let’s say we want to solve an abstract operator equation Alu) = |v) in an 
N-dimensional vector space V. To this end, we select a basis B = {la;)} ID 
write the equation in matrix form, and solve the resulting system of N lin- 
ear equations. This produces the components of the solution |u) in B. If 
components in another basis B’ are desired, they can be obtained using the 
similarity transformation connecting the two bases (see Chap. 5). 

There is a standard formal procedure for obtaining the matrix equation. 
It is convenient to choose an orthonormal basis B = {le;)}_ , for V and 
refer all components to this basis. The procedure involves contracting both 
sides of the equation with (e;| and inserting 1 = pe 1 1e;)(e;| between A 
and |u): 


N 
Y “(ei|Alej)(ejlu) = (ejlv) fori =1,2,...,N, 
j=l 
or 
N 
> Aijuy = vi fori=1,2,...,N, (17.1) 
j=l 


where Aj; = (e;|Ale;), uj = (ej|u), and v; = (e;|v). Equation (17.1) is a 
system of N linear equations in N unknowns {uj} iy , which can be solved 
to obtain the solution(s) of the original equation in B. 

A convenient basis is that in which A is represented by a diagonal ma- 
trix diag(A1, 42,..., A). Then the operator equation takes the simple form 
Aju; = v;, and the solution becomes immediate. 
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Let us now apply the procedure just described to infinite-dimensional 
vector spaces, in particular, for the case of a continuous index. We want to 
find the solutions of K|u) = |). Following the procedure used above, we 
obtain 


b b 
(x|K (/ ly)w(y)(yI ay) |u) = (x|K|y)w(y)(ylu) dy = (xIf), 
a a 
———S 
=1 
where we have used the results obtained in Sect. 7.3. Writing this in func- 
tional notation, we have 


b 
i K(x, y)w(y)u(y) dy = F(x), (17.2) 


which is the continuous analogue of Eq. (17.1). Here (a, b) is the interval 
on which the functions are defined. We note that the indices have turned into 
continuous arguments, and the sum has turned into an integral. The operator 
K that leads to an equation such as (17.2) is called an integral operator 
(IO), and the “matrix element” K (x, y) is said to be its kernel. 

The discussion of the discrete case mentioned the possibility of the oper- 
ator A being diagonal in the given basis B. Let us do the same with (17.2); 
that is, noting that x and y are indices for K, let us assume that K(x, y) =0 
for x # y. Such operators are called local operators. For local operators, 
the contribution to the integral comes only at the point where x = y (hence, 
their name). If K(x, y) is finite at this point, and the functions w(y) and 
u(y) are well behaved there, the LHS of (17.2) will vanish, and we will get 
inconsistencies. To avoid this, we need to have 


K(ix,yya ho N*F 

co ifx=y. 
Thus, K(x, y) has the behavior of a delta function. Letting K(x, y) = 
L(x)6(x — y)/w(x) and substituting in Eq. (17.2) yields L(x)u(x) = f(x). 

In the discrete case, 4; was merely an indexed number; its continuous 
analogue, L(x), may represent merely a function. However, the fact that x 
is a continuous variable (index) gives rise to other possibilities for L(x) that 
do not exist for the discrete case. For instance, L(x) could be a differential 
operator. The derivative, although defined by a limiting process involving 
neighboring points, is a local operator. Thus, we can speak of the derivative 
of a function at a point. For the discrete case, u; can only “hop” from i 
to i + 1 and then back to 7. Such a difference (as opposed to differential) 
process is not local; it involves not only 7 but also i + 1. The “point” i does 
not have an (infinitesimally close) neighbor. 

This essential difference between discrete and continuous operators 
makes the latter far richer in possibilities for applications. In particular, if 
L(x) is considered a differential operator, the equation L(x)u(x) = f(x) 
leads directly to the fruitful area of differential equation theory. 
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17.2. Bounded Operators in Hilbert Spaces 


The concept of an operator on a Hilbert space is extremely subtle. Even the 
elementary characteristics of operators, such as the operation of hermitian 
conjugation, cannot generally be defined on the whole Hilbert space. 

In finite-dimensional vector spaces there is a one-to-one correspondence 

between operators and matrices. So, in some sense, the study of operators 
reduces to a study of matrices, which are collections of real or complex 
numbers. Although we have already noted an analogy between matrices and 
kernels, a whole new realm of questions arises when Ajj; is replaced by 
K(x, y)—questions about the continuity of K(x, y) in both its arguments, 
about the limit of K(x, y) as x and/or y approach the “end points” of the 
interval on which K is defined, about the boundedness and “compactness” of 
K, and so on. Such subtleties are not unexpected. After all, when we tried to 
generalize concepts of finite-dimensional vector spaces to infinite dimen- 
sions in Chap. 7, we encountered difficulties. There we were concerned 
about vectors only; the generalization of operators is even more compli- 
cated. 
Example 17.2.1 Recall that C° is the set of sequences |a) = {aj}?°,, 
or of co-tuples (@1,@2,...), that satisfy the convergence requirement 
yi |ov j |? < oo (see Example 2.1.2). It is a Hilbert space with inner prod- 
uct defined by (a|b) = i a’ Bj. The standard (orthonormal) basis for 
C®™ is {]e;)}?2,, where |e;) has all components equal to zero except the ith 
one, which is |. Then one has |a) = yi aj|e;). 

One can introduce an operator X, called the right-shift operator, by 


X|a) = x(Soeiten] =) ajlej+). 
j=l j=l 


In other words, X transforms (a1, a@2,...) to (0,a@1,@2,...). It is straight- 
forward to show that X is indeed a linear operator. 


The first step in our study of vector spaces of infinite dimensions was 
getting a handle on the convergence of infinite sums. This entailed defining 
a norm for vectors and a distance between them. In addition, we noted that 
the set of linear transformations £(V, W) was a vector space in its own right. 
Since operators are “vectors” in this space, the study of operators requires 
constructing a norm in £(V, W) when V and W are infinite-dimensional. 


Definition 17.2.2 Let 3{; and F(z be two Hilbert spaces with norms || - ||1 
and || - |l2. For any T € £(F(, H2), the number 


ax{ || Tx|l2 | Ino} 
Il lh 


right-shift operator 


513 


514 


operator norm 


bounded operator 


bounded operators are 
continuous 


17 Introductory Operator Theory 


(if it exists) is called! the operator norm of T and is denoted by ||T||. A lin- 
ear transformation whose norm is finite is called a bounded linear trans- 
formation. A bounded linear transformation from a Hilbert space to itself is 
called a bounded operator. The collection of all bounded linear transfor- 
mations, which is a subset of £(9{;, H2), will be denoted by B(F(;, H2), 
and if H; = Hz = 4, it will be denoted by B(H). 


Note that || - ||; and || - ||2 are the norms induced by the inner product of 
Hy, and Hz. Also note that by dividing by ||x||; we eliminate the possibility 
of dilating the norm of ||T|| by choosing a “long” vector. By restricting the 
length of |x), one can eliminate the necessity for dividing by the length. In 
fact, the norm can equivalently be defined as 


| TI] = max{ || Tx 2 | ]xi]1 = 1} = max{||Tx|l2 | Nella <1}. (17.3) 


It is straightforward to show that the three definitions are equivalent and they 
indeed define a norm. 


Proposition 17.2.3 An operator T is bounded if and only if it maps vectors 
of finite norm to vectors of finite norm. 


Proof Clearly, if T is bounded, then ||Tx|| has finite norm. Conversely, if 
||Tx||2 is finite for all |x) (of unit length), max{||Tx|l2 | ||x||1 = 1} is also 
finite, and T is bounded. 


An immediate consequence of the definition is 
Tx]l2 <TH Mel Vix) €F4. (17.4) 
If we choose |x) — |y) instead of |x), it will follow from (17.4) that as |x) 
approaches | y), T|x) approaches T|y). This is the property that characterizes 


continuity: 


Proposition 17.2.4 The bounded operator T € B(H,, H2) is a continuous 
linear map from H, to H2. 


Another consequence of the definition is that 


Box 17.2.5 BCH, H2) is a vector subspace of £(F(,, H2), and for 
Hy = Hz = H, we have 1 € B(KH) and |\1|| = 1. 


'The precise definition uses “supremum” instead of “maximum”. Rather than spending a 
lot of effort explaining the difference between the two concepts, we use the less precise, 
but more intuitively familiar, concept of “maximum”. 
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Example 17.2.6 We have seen that in an inner product space, one can as- 
sociate a linear operator (linear functional) to every vector. Thus, associated 
with the vector |x) in a Hilbert space H is the linear operator fy : H > C 
defined by f,.(|y)) = (x|y). We want to compare the operator norm of f, 
with the norm of |x). First note that by using the Schwarz inequality, we get 


If. = max OMT iy 2 of = max Heo? |) +0| < Il. 
iy Iv 


On the other hand, from ||x||* = f,(|x)), we obtain 


f. (1x)) lfc dy) 
I] = < max} = | Iy) £0} = Ife 
Il || Ily ll 
These two inequalities imply that ||f,, || = |||]. 


Example 17.2.7 The derivative operator D = d/dx is not a bounded oper- 
ator on the Hilbert space* £7(a,b) of square-integrable functions. With a 
function like f(x) = ./x — a, one gets 
derivative operator is 
b-a unbounded 


Jet 


while df/dx = 1/(2,/x — a) gives ||Df ||? = Lda —a)=oo. We 
conclude that ||D|| = oo. 


b 1 
P= f @-aae= 50-4) > Ifll= 


Since £(5) is an algebra as well as a vector space, one may be inter- . 
norm of a product is less 


than the product of 
norms. 


ested in the relation between the product of operators and their norms. More 
specifically, one may want to know how ||ST|| is related to |||] and ||T||. 


Proposition 17.2.8 If S and T are bounded operators, then 
| STI] < S| ITI. (17.5) 


In Particular, ||T" || < ||T||". 


Proof Use the definition of operator norm for the product ST: 


fist 
|ST|| = max J ix) 40 
[el 
~ waxl IST Ty, zor 
[TI [el 


Here the two Hilbert spaces coincide, so that the derivative operator acts on a single 
Hilbert space. 
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SCT Ix))| x || 
ema ITI Ti ) #0} ma {Ta I Hs) #0} 


=(TIl 


Now note that the first term on the RHS does not scan all the vectors for 
maximality: It scans only the vectors in the image of T. If we include all 
vectors, we may obtain a larger number. Therefore, 


|SCT|x))| |S : 
«| [Tx] Tix) 20] =m anf I i) 20] = ue 


and the desired inequality is established. 


We can put Eq. (17.5) to immediate good use. 


Proposition 17.2.9 Let H be a Hilbert space and T € B(H). If ||T|| < 1, 
then 1 —T is invertible and (1—T)~! = yar. 


Proof First note that the series converges, because 


yr =DIr| <n = _ aT 


n=0 n=0 


and the sum has a finite norm. Furthermore, 


a _nyt= =(1 “(i lim Le), = lim (1 “pyr 


n=0 n=0 


k k 
ee no m+1)_ 1: — etl) — 
= in (Sor zy Pea ee ae 


because 
0< lim |T*"| < lim |TI‘t! =0 
k-0o k>0o 


for ||T|| < 1, and the vanishing of the norm implies the vanishing of the 
operator itself. One can similarly show that Oakes ~91")(1 —T) = 1. 


A corollary of this proposition is that operators that are “close enough” to 
an invertible operator are invertible (see Problem 17.1). Another corollary, 
whose proof is left as a straightforward exercise, is the following: 


Corollary 17.2.10 Let T € B(H) and x a complex number such that ||T|| < 
|A|. Then T — 11 is an invertible operator, and 


CO 


edi T\" 
T-a1) = >>(5)- 


n=0 


17.3 Spectra of Linear Operators 


17.2.1 Adjoints of Bounded Operators 


Adjoints play an important role in the study of operators. We recall that the 
adjoint of T is defined as 


(y[Tlx)* = (xIT* ly) or (Txly) = (xITTy). 


In the finite-dimensional case, we could calculate the matrix representation 
of the adjoint in a particular basis using this definition and generalize to all 
bases by similarity transformations. That is why we never raised the ques- 
tion of the existence of the adjoint of an operator. In the infinite-dimensional 
case, one must prove such an existence. We state the following theorem 
without proof: 


Theorem 17.2.11 Let T ¢ BCH). Then the adjoint of T, defined by 
(Tx]y) = (xIT"y), 


exists. Furthermore, ||T|| = ||T*|. 


17.3 Spectra of Linear Operators 


One of the most important results of the theory of finite-dimensional vec- 
tor spaces is the spectral decomposition theorem developed in Chap. 6. The 
infinite-dimensional analogue of that theorem is far more encompassing and 
difficult to prove. It is beyond the scope of this book to develop all the ma- 
chinery needed for a thorough discussion of the infinite-dimensional spectral 
theory. Instead, we shall present the central results, and occasionally intro- 
duce the reader to the peripheral arguments when they seem to have their 
own merits. 


Definition 17.3.1 Let T € £(H). A complex number 2 is called a 
regular point of T if the operator (T — 41)! exists and is bounded. 
The set of all regular points of T is called the resolvent set of T, and 
is denoted by p(T). The complement of o(T) in the complex plane is 
called the spectrum of T and is denoted by o(T). 


Note that if T is bounded, then T — 41 is automatically bounded. 

Corollary 17.2.10 implies that if T is bounded, then p(T) is not empty,” 
and that the spectrum of a bounded linear operator on a Hilbert space is 
a bounded set. In fact, an immediate consequence of the corollary is that 
A < ||T|| for all A € o(T). 


3One can simply choose a A whose absolute value is greater than ||T||. 
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It is instructive to contrast the finite-dimensional case against the implica- 
tions of the above definition. Recall that because of the dimension theorem, 
a linear operator on a finite-dimensional vector space V is invertible if and 
only if it is either onto or one-to-one. Now, A € o(T) if and only if T— 41 
is not invertible. For finite dimensions, this implies that*+ ker(T — 41) 4 0. 
Thus, in finite dimensions, A € o(T) if and only if there is a vector |a) in V 
such that (T — A1)|a) = 0. This is the combined definition of eigenvalue and 
eigenvector, and is the definition we will have to use to define eigenvalues 
in infinite dimensions. It follows that in the finite-dimensional case, o (T) 
coincides with the set of all eigenvalues of T. This is not true for infinite 
dimensions, as the following example shows. 


Example 17.3.2 Consider the right-shift operator acting on C™. It is easy 
to see that ||T,-a|| = ||a|| for all |a). This yields ||T,|| = 1, so that any A that 
belongs to o (T;) must be such that |A| < 1. We now show that the converse 
is also true, i.e., that if |A| < 1, then A € o(T;). It is sufficient to show that if 
0 < |A| < 1, then T, — 41 is not invertible. To establish this, we shall show 
that T, — 41 is not onto. 

Suppose that T,, — 41 is onto. Then there must be a vector |a) such that 
(T, — 241)|a) = |e1) where |e;) is the first standard basis vector of C°. 
Equating components on both sides yields the recursion relations a} = 
—1/d, and wj-1 = Aa; for all 7 => 2. One can readily solve this recursion 
relation to obtain a; = —1/ J for all j. This is a contradiction, because 


CO lo) 1 
dala = ap 


j=l j=l 


will not converge if 0 < |A| < 1, i-e., |a) ¢ C™, and therefore T, — 41 is not 
onto. 

We conclude that o (T;-) = {A € C | 0 < |A| < 1}. If we could generalize 
the result of the finite-dimensional case to C®, we would conclude that all 
complex numbers whose magnitude is at most | are eigenvalues of T,-. Quite 
to our surprise, the following argument shows that T, has no eigenvalues at 
all! 

Suppose that A is an eigenvalue of T,. Let |a) be any eigenvector for A. 
Since T, preserves the length of a vector, we have 


(ala) = (T,a|T,a) = (Aa|Aa) = |A|* (ala). 


It follows that |A| = 1. Now write |a) = {aj JP and let @,, be the first 
nonzero term of this sequence. Then 0 = (T,-ale) = (Aa|ém) = Ad. The 
first equality comes about because T,.|a) has its first nonzero term in the 
(m + 1)st position. Since A 4 0, we must have a, = 0, which contradicts 
the choice of this number. 


4Note how critical finite-dimensionality is for this implication. In infinite dimensions, an 
operator can be one-to-one (thus having a zero kernel) without being onto. 
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17.4 Compact Sets 


This section deals with some technical concepts, and as such will be rather 
formal. The central concept of this section is compactness. Although we 
shall be using compactness sparingly in the sequel, the notion has sufficient 
application in higher analysis and algebra that it warrants an introductory 
exposure. 

Let us start with the familiar case of the real line, and the intuitive notion 
of “compactness”. Clearly, we do not want to call the entire real line “com- 
pact’, because intuitively, it is not. The next candidate seems to be a “finite” 
interval. So, first consider the open interval (a, b). Can we call it compact? 
Intuition says “yes”, but the following argument shows that it would not be 
appropriate to call the open interval compact. 

Consider the map g : R > (a, b) given by 


b-a b+a 
tanht + a 


g(t) = 


The reader may check that this map is continuous and bijective. Thus, we 
can continuously map all of R in a one-to-one manner onto (a, b). This 
makes (a, b) “look” very much> like R. How can we modify the interval 
to make it compact? We do not want to alter its finiteness. So, the obvious 
thing to do is to add the end points. Thus, the interval [a, b] seems to be a 
good candidate; and indeed it is. 

The next step is to generalize the notion of a closed, finite interval and 
eventually come up with a definition that can be applied to all spaces. First 
we need some terminology. 


Definition 17.4.1 An open ball B,(x) of radius r and center |x) in a 
normed vector space V is the set of all vectors in V whose distance from 
|x) is strictly less than r: 


B,(x) = {ly) € VI lly — xl <r}. 
We call B,(x) an open round neighborhood of |x). 


This is a generalization of open interval because 


b b— 
(a.0)={yeR||y EE |< “|. 


2 2 


Example 17.4.2 A prototype of finite-dimensional normed spaces is R”. 
An open ball of radius r centered at x is 


B(x) = {y €R| (1 — x1)? + 02 — 2)? Fe + On — Xn)? <7}. 


Thus, all points inside a circle form an open ball in the xy-plane, and all 
interior points of a solid sphere form an open ball in space. 


5In topological jargon one says that (a, b) and R are homeomorphic. 
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Definition 17.4.3 A bounded subset of a normed vector space is a subset 
that can be enclosed in an open ball of finite radius. 


For example, any region drawn on a piece of paper is a bounded subset 
of R?, and any “visible” part of our environment is a bounded subset of R? 
because we can always find a big enough circle or sphere to enclose these 
subsets. 


Definition 17.4.4 A subset O of a normed vector space V is called open if 
each of its points (vectors) has an open round neighborhood lying entirely 
in O. A boundary point of O is a point (vector) in V all of whose open round 
neighborhoods contain points inside and outside O. A closed subset € of V 
is a subset that contains all of its boundary points. The closure of a subset S 
is the union of S and all of its boundary points, and is denoted by S. 


For example, the boundary of a region drawn on paper consists of all its 
boundary points. A curve drawn on paper has nothing but boundary points. 
Every point is also its own boundary. A boundary is always a closed set. 
In particular, a point is a closed set. In general, an open set cannot contain 
any boundary points. A frequently used property of a closed set € is that a 
convergent sequence of points of C converges to a point in C. 


Definition 17.4.5 A subset W of a normed vector space V is dense in V if 
the closure of W is the entire space V. Equivalently, W is dense if, given any 
|u) € V and any ¢€ > 0, there is a |w) € W such that ||u — w|| <€, ie., any 
vector in V can be approximated, with arbitrary accuracy, by a vector in W. 


A paradigm of dense spaces is the set of rational numbers in the normed 
vector space of real numbers. It is a well-known fact that any real number 
can be approximated by a rational number with arbitrary accuracy: The dec- 
imal (or binary) representation of real numbers is precisely such an approx- 
imation. An intuitive way of imagining denseness is that the (necessarily) 
infinite subset is equal to almost all of the set, and its members are scattered 
“densely” everywhere in the set. The embedding of the rational numbers in 
the set of real numbers, and how they densely populate that set, is a good 
mental picture of all dense subsets. 

A useful property involving the concept of closure and openness has to 
do with continuous maps between normed vector spaces. Let f : Hy > Ha 
be a continuous map. Let > be an open set in I{2. Let f~!(O) denote the 
inverse image of Oz, i.e., all points of F(; that are mapped to Oz. Let |x1) 
be a vector in f7'(02), |x2) = f(|x1)), and let B.(x2) be a ball contained 
entirely in 02. Then f —!(Be(x2)) contains |x1) and lies entirely in f —1(). 
Because of the continuity of f, one can now construct an open ball centered 
at |x;) lying entirely in f~!(Be(x)), and by inclusion, in f~!(2). This 
shows that every point of f—!(@2) has a round open neighborhood lying 
entirely in f—'(O2). Thus, f~!(O2) is an open subset. One can similarly 
show the corresponding property for closed subsets. We can summarize this 
in the following: 


17.4 Compact Sets 


Proposition 17.4.6 Let f : 3, — Ho» be continuous. Then the inverse im- 
age of an open (closed) subset of Hz is an open (closed) subset of H,. 


Consider the resolvent set of a bounded operator T. We claim that this set 
is open in C. To see this, note that if A € o(T), then T — 41 is invertible. On 
the other hand, Problem 17.1 shows that operators close to an invertible op- 
erator are invertible. Thus, if we choose a sufficiently small positive number 
€ and consider all complex numbers jz within a distance € from A, then all 
operators of the form T — 121 are invertible, i.e., 4 € o(T). Therefore, any 
2 € p(T) has an open round neighborhood in the complex plane all points 
of which are in the resolvent. This shows that the resolvent set is open. In 
particular, it cannot contain any boundary points. However, p(T) and o(T) 
have to be separated by a common boundary.° Since p(T) cannot contain 
any boundary point, o(T) must carry the entire boundary. This shows that 
o(T) is aclosed subset of C. Recalling that o (T) is also bounded, we have 
the following result. 


Proposition 17.4.7 For any T € B(K) the set p(T) is an open subset 
of C and o (1) is a closed, bounded subset of C. 


17.4.1 Compactness and Infinite Sequences 


Let us go back to the notion of compactness. It turns out that the feature of 
the closed interval [a, b] most appropriate for generalization is the behavior 
of infinite sequences of numbers lying in the interval. More specifically, 
let {a }2°, be a sequence of infinitely many real numbers all lying in the 
interval [a, b]. It is intuitively clear that since there is not enough room for 
these points to stay away from each other, they will have to crowd around a 
number of points in the interval. For example, the sequence 


2n+1)% 3 5 7 9 
=)” = ito, : Baas 
{ _—e he a’3’ 12’ 16 


in the interval [—1, +1] crowds around the two points -5 and +3, i.e., the 
sequence has two limits, both in the interval. In fact, the points with even 
n accumulate around +5 and those with odd 1 crowd around —5. It turns 
out that all closed intervals of IR have this property, namely, all sequences 
crowd around some (limit) points of the interval. To see that open intervals 
do not share this property consider the open interval (0, 1). The sequence 
{st }= {}, i, ...} clearly has the limit point zero, which is not a point 
of the interval. But we already know that open intervals are not compact. 


The spectrum of a bounded operator need not occupy any “area” in the complex plane. 
It may consist of isolated points or line segments, etc., in which case the spectrum will 
constitute the entire boundary. 
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Definition 17.4.8 (Bolzano-Weierstrass Property) A subset K of a normed 
vector space is called compact if every (infinite) sequence in K has a con- 
vergent subsequence. 


The reason for the introduction of a subsequence in the definition is that 
a sequence may have many points to which it converges. But no matter how 
many of these points there may exist, one can always obtain a convergent 
subsequence by choosing from among the points in the sequence. For in- 
stance, in the example above, one can choose the subsequence consisting 
of elements for which n is even. This subsequence converges to the single 
point + 5 : 

An important theorem in real analysis characterizes all compact sets in 
R":/ 


Theorem 17.4.9 (BWHB) A subset of R" is compact if and only if it is 
closed and bounded. 


We showed earlier that the spectrum of a bounded linear operator is 
closed and bounded. Identifying C with R?, the BWHB theorem implies 
that 


Box 17.4.10 The spectrum of a bounded linear operator is a compact 
subset of C. 


An immediate consequence of the BWHB Theorem is that every bounded 
subset of R” has a compact closure. Since R” is a prototype of all finite- 
dimensional (normed) vector spaces, the same statement is true for all such 
vector spaces. What is interesting is that the statement indeed characterizes 
the normed space: 


Theorem 17.4.11 A normed vector space is finite-dimensional if and only 
if every bounded subset has a compact closure. 


This result can also be applied to subspaces of a normed vector space: 
A subspace W of a normed vector space V is finite-dimensional if and only 
if every bounded subset of W has a compact closure in W. A useful version 
of this property is stated in terms of sequences of points (vectors): 


7BWHB stands for Bolzano, Weierstrass, Heine, and Borel. Bolzano and Weierstrass 
proved that any closed and bounded subset of IR has the Bolzano-Weierstrass prop- 
erty. Heine and Borel abstracted the notion of compactness in terms of open sets, and 
showed that a closed bounded subset of R is compact. The BWHB theorem as ap- 
plied to R is usually called the Heine-Borel theorem (although some authors call it the 
Bolzano-Weierstrass theorem). Since the Bolzano—Weierstrass property and compactness 
are equivalent, we have decided to choose BWHB as the name of our theorem. 


17.5 Compact Operators 


Theorem 17.4.12 A subspace W of a normed vector space V is fi- 
nite dimensional if and only if every bounded sequence in W has a 
convergent subsequence in W. 


Historical Notes 

Karl Theodor Wilhelm Weierstrass (1815-1897) was both the greatest analyst and the 
world’s foremost teacher of advanced mathematics of the last third of the nineteenth 
century. His career was also remarkable in another way—and a consolation to all “late 
starters” —for he began the solid part of his professional life at the age of almost 40, when 
most mathematicians are long past their creative years. 

His father sent him to the University of Bonn to qualify for the higher ranks of the Prussian 
civil service by studying law and commerce. But Karl had no interest in these subjects. 
He infuriated his father by rarely attending lectures, getting poor grades, and instead, 
becoming a champion beer drinker. He did manage to become a superb fencer, but when 
he returned home, he had no degree. 

In order to earn his living, he made a fresh start by teaching mathematics, physics, botany, 
German, penmanship, and gymnastics to the children of several small Prussian towns dur- 
ing the day. During the nights, however, he mingled with the intellectuals of the past, par- 
ticularly the great Norwegian mathematician Abel. His remarkable research on Abelian 
functions was carried on for years without the knowledge of another living soul; he didn’t 
discuss it with anyone at all, or submit it for publication in the mathematical journals of 
the day. 

All this changed in 1854 when Weierstrass at last published an account of his research on 
Abelian functions. This paper caught the attention of an alert professor at the University 
of Konigsberg who persuaded his university to award Weierstrass an honorary doctor’s 
degree. The Ministry of Education granted Weierstrass a year’s leave of absence with pay 
to continue his research, and the next year he was appointed to the University of Berlin, 
where he remained the rest of his life. 

Weierstrass’s great creative talents were evenly divided between his thinking and his 
teaching. The student notes of his lectures, and copies of these notes, and copies of copies, 
were passed from hand to hand throughout Europe and even America. Like Gauss he was 
indifferent to fame, but unlike Gauss he endeared himself to generations of students by 
the generosity with which he encouraged them to develop and publish, and receive credit 
for, ideas and theorems that he essentially originated himself. Among Weierstrass’s stu- 
dents and followers were Cantor, Schwarz, Hdlder, Mittag-Leffler, Sonja Kovalevskaya 
(Weierstrass’s favorite student), Hilbert, Max Planck, Willard Gibbs, and many others. 

In 1885 he published the famous theorem now called the Weierstrass approximation 
theorem (see Theorems 7.2.3 and 9.1.1), which was given a far-reaching generalization, 
with many applications, by the modern American mathematician M. H. Stone. 

The quality that came to be known as “Weierstrassian rigor” was particularly visible in 
his contributions to the foundations of real analysis. He refused to accept any statement 
as “intuitively obvious,” but instead demanded ironclad proof based on explicit properties 
of the real numbers. The careful reasoning required for these proofs was founded on a 
crucial property of the real numbers now known as the BWHB theorem. 


17.5 Compact Operators 


It is straightforward to show that if K is a compact set in F{, and f : Hy; > 
Hy is continuous, then f (IC) (the image of K) is compact in HH. Since all 
bounded operators are continuous, we conclude that all bounded operators 
map compact subsets onto compact subsets. There is a special subset of 
B(H 1, H2) that deserves particular attention. 
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Definition 17.5.1 An operator K € B(F,, H2) is called a compact 
operator if it maps a bounded subset of J{; onto a subset of F{2 with 
compact closure. 


Since we will be dealing with function spaces, and since it is easier to 
deal with sequences of functions than with subsets of the space of functions, 
we find it more useful to have a definition of compact operators in terms of 
sequences rather than subsets. Thus, instead of a bounded subset, we take 
a subset of it consisting of a (necessarily) bounded sequence. The image of 
this sequence will be a sequence in a compact set, which, by definition, must 
have a convergent subsequence. We therefore have the following: 


Theorem 17.5.2 An operator K € B(H,, H2) is compact if and only if for 
any bounded sequence {|xn)} in H1, the sequence {K|x,)} has a convergent 
subsequence in Hp. 


Example 17.5.3 Consider B(H), the set of bounded operators on the 
Hilbert space J. If K is a compact operator and T a bounded operator, then 
KT and TK are compact. This is because {T|x,) = |y,)} is a bounded se- 
quence if {|x,)} is, and {K|y,) = KT|x,)} has a convergent subsequence, 
because K is compact. For the second part, use the first definition of the 
compact operator and note that K maps bounded sets onto compact sets, 
which T (being continuous) maps onto a compact set. As a special case of 
this property we note that the product of two compact operators is compact. 
Similarly, one can show that any linear combination of compact operators is 
compact. Thus, any polynomial of a compact operator is compact. In partic- 
ular, 


n n 


n! P n! : 
1-—K)"= ———_(—K)/ =1 K)/ =1-K,, 
(1—K) 2 Fa Hi ) i a ) 


where K,, is a compact operator. 


Definition 17.5.4 An operator T € £(4{(1, H2) is called a finite rank oper- 
ator if its range is finite-dimensional. 


The following is clear from Theorem 17.4.12. 


Proposition 17.5.5 A finite rank operator is compact. In particular, 
every linear transformation of a finite-dimensional vector space is 
compact. 


Theorem 17.5.6 Jf {K,} € £(4(1, H2) are compact and K € £(H1, H2) is 
such that ||K — Ky || — 0, then K is compact. 
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Proof See [DeVi 90]. 


Recall that given an orthonormal basis {|e;)}?°,, any operator T on a 
Hilbert space + can be written as ra cijlei)(e;|, where cj; = (e;|Tle;). 
Now let K be a compact operator and consider the finite rank operators 


n 


K, = > cijlei)(ejl, cij = (ei|Kle;). 


Lge) 


Clearly, ||K — K,|| — 0. The hermitian adjoints {Ki} are also of finite rank 
(therefore, compact). Barring some convergence technicality, we see that 
K', which is the limit of the sequence of these compact operators, is also 
compact. 


Theorem 17.5.7 K is a compact operator if and only if K' is. K is compact iff K* is 


A particular type of operator occurs frequently in integral equation the- 
ory. These are called Hilbert-Schmidt operators and defined as follows: 


Definition 17.5.8 Let H{ be a Hilbert space, and {|e;)}?°, an orthonormal 
basis. An operator T € £(4) is called Hilbert-Schmidt if Hilbert-Schmidt 


operators 
Co 


CO [oe 
t(T'T) =) (ei[TTle;) = ) (Te |Te) =) [I Tei? < 00. 
i=l i= 


i=l 
Theorem 17.5.9 Hilbert-Schmidt operators are compact. 


For a proof, see [Rich 78, pp. 242-246]. 


Example 17.5.10 It is time to give a concrete example of a compact 

(Hilbert-Schmidt) operator. For this, we return to Eq. (17.2) with w(y) = 1, 

and assume that |v) € £7 @;b), Suppose further that the function K (x, y) 

is continuous on the closed rectangle [a,b] x [a,b] in the xy-plane (or 

IR). Under such conditions, K(x, y) is called a Hilbert-Schmidt kernel. Hilbert-Schmidt kernel 
We now show that K is compact. First note that due to the continuity of 

K(x, y), rh ic |K (x, y)[2dx dy < om. Next, we calculate the trace of K'K. 

Let {|e;)}?2, be any orthonormal basis of £7 (a, b). Then 


Cc 
trK'K = 5 (e;|K'Kle;) 
i=l 


= ff (e;|x) (x|K"|y)(yIKlz) (zlez) dx dy dz 


[e,2) 


= [lf ylK|x)*(yIKlz) (lei) (ex) dx dy dz 


i=l 
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=d(x—-z) 


= fff omiyronie (zl (Sle) (esl } 1x) dx dy dz 
i=1 


=1 


b eb > 
=i i |K (x, y)| dxdy<om. 
a a 


Historical Notes 

Bernard Bolzano (1781-1848) was a Czech philosopher, mathematician, and theologian 
who made significant contributions to both mathematics and the theory of knowledge. He 
entered the Philosophy Faculty of the University of Prague in 1796, studying philosophy 
and mathematics. He wrote “My special pleasure in mathematics rested therefore particu- 
larly on its purely speculative parts, in other words I prized only that part of mathematics 
which was at the same time philosophy.” 

In the autumn of 1800 he began three years of theological study while he was preparing 
a doctoral thesis on geometry. He received his doctorate in 1804 for a thesis in which he 
gave his view of mathematics and what constitutes a correct mathematical proof. In the 
preface he wrote: 


I could not be satisfied with a completely strict proof if it were not derived from 
concepts which the thesis to be proved contained, but rather made use of some 
fortuitous, alien, intermediate concept, which is always an erroneous transition to 
another kind. 


Two days after receiving his doctorate Bolzano was ordained a Roman Catholic priest. 
However, he came to realize that teaching and not ministering defined his true vocation. 
In the same year, Bolzano was appointed to the chair of philosophy and religion at the 
University of Prague. Because of his pacifist beliefs and his concern for economic justice, 
he was suspended from his position in 1819 after pressure from the Austrian government. 
Bolzano had not given up without a fight but once he was suspended on a charge of heresy 
he was put under house arrest and forbidden to publish. 

Although some of his books had to be published outside Austria because of government 
censorship, he continued to write and to play an important role in the intellectual life of his 
country. Bolzano intended to write a series of papers on the foundations of mathematics. 
He wrote two, the first of which was published. Instead of publishing the second one he 
decided to “... make myself better known to the learned world by publishing some papers 
which, by their titles, would be more suited to arouse attention.” 

Pursuing this strategy he published Der binomische Lehrsatz ... (1816) and Rein ana- 
lytischer Beweis ... (1817), which contain an attempt to free calculus from the concept 
of the infinitesimal. He is clear in his intention stating in the preface of the first that the 
work is “a sample of a new way of developing analysis.” The paper gives a proof of the 
intermediate value theorem with Bolzano’s new approach and in the work he defined what 
is now called a Cauchy sequence. The concept appears in Cauchy’s work four years later 
but it is unlikely that Cauchy had read Bolzano’s work. 

After 1817, Bolzano published no further mathematical works for many years. Between 
the late 1820s and the 1840s, he worked on a major work Gréssenlehre. This attempt 
to put the whole of mathematics on a logical foundation was published in parts, while 
Bolzano hoped that his students would finish and publish the complete work. 

His work Paradoxien des Unendlichen, a study of paradoxes of the infinite, was published 
in 1851, three years after his death, by one of his students. The word “set” appears here 
for the first time. In this work Bolzano gives examples of 1—1 correspondences between 
the elements of an infinite set and the elements of a proper subset. 

Bolzano’s theories of mathematical infinity anticipated Georg Cantor’s theory of infinite 
sets. It is also remarkable that he gave a function which is nowhere differentiable yet 
everywhere continuous. 


17.6 Spectral Theorem for Compact Operators 


17.5.1 Spectrum of Compact Operators 


Our next task is to investigate the spectrum o(K) of a compact operator K 
on a Hilbert space J{. We are particularly interested in the set of eigenvalues 
and eigenvectors of compact operators. Recall that every eigenvalue of an 
operator on a vector space of finite dimension is in its spectrum, and that 
every point of the spectrum is an eigenvalue (see p. 518). In general, the 
second statement is not true. In fact, we saw that the right-shift operator had 
no eigenvalue at all, yet its spectrum was the entire unit disk of the complex 
plane. 

We first observe that 0 € o(K), because otherwise 0 € o(K), which im- 
plies that K = K — 01 is invertible with inverse K~'. The product of two 
compact operators (in fact, the product of a compact and a bounded opera- 
tor) is compact (see Example 17.5.3). This yields a contradiction® because 
the unit operator cannot be compact: It maps a bounded sequence to itself, 
not to a sequence with a convergent subsequence. 

The next theorem, whose proof can be found in [DeVi 90], characterizes 
the spectrum of a compact operator completely. 


Theorem 17.5.11 Let K be a compact operator on an infinite-dimensional 
Hilbert space H. Then 


1. Oceo(K). 

2. Each nonzero point of o (K) is an eigenvalue of K whose eigenspace is 
finite-dimensional. 

3. o(K) is either a finite set or it is a sequence that converges to zero. 


17.6 Spectral Theorem for Compact Operators 


The finite-dimensional spectral decomposition theorem of Chap. 6 was 
based on the existence of eigenvalues, eigenspaces, and projection opera- 
tors. Such existence was guaranteed by the existence of an inner product for 
any finite-dimensional vector space. The task of establishing spectral de- 
composition for infinite-dimensional vector spaces is complicated not only 
by the possibility of the absence of an inner product, but also by the ques- 
tions of completeness, closure, and convergence. One can eliminate the first 
two hindrances by restricting oneself to a Hilbert space. However, even so, 
one has to deal with other complications of infinite dimensions. 

As an example, consider the relation V = W @ W+, which is trivially 
true for any subspace W in finite dimensions once an orthonormal basis is 
chosen. Recall that the procedure for establishing this relation is to comple- 
ment a basis of W to produce a basis for the whole space. In an infinite- 
dimensional Hilbert space, we do not know a priori how to complement the 


8Our conclusion is valid only in infinite dimensions. In finite dimensions, all operators, 
including 1, are compact. 
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Fig. 17.1 The shaded area represents a convex subset of the vector space. It consists 
of vectors whose tips lie in the shaded region. It is clear that there is a (unique) vector 
belonging to the subset whose length is minimum 


Fig. 17.2 The shaded area represents the subspace M of the vector space. The convex 
subset E consists of all vectors connecting points of M to the tip of |w). It is clear that 
there is a (unique) vector belonging to E whose length is minimum. The figure shows 
that this vector is orthogonal to M 


basis of a subspace (which may be infinite-dimensional). Thus, one has to 
prove the existence of the orthogonal complement of a subspace. Without 
going into details, we sketch the proof. First a definition: 


Definition 17.6.1 A convex subset E of a vector space is a collection of 
vectors such that if |) and |v) are in E, then |w) — t(|u) — |v)) is also in E 
foralO<t<l. 


Intuitively, any two points of a convex subset can be connected by a 
straight line segment lying entirely in the subset. 

Let E be a closed convex subset (not a subspace) of a Hilbert space HH. 
One can show that there exists a unique vector in E with minimal norm (see 
Fig. 17.1). Now let M be a subspace of 9{. For an arbitrary vector |) in 
+, consider the subset E = |u) — M, i.e., all vectors of the form |u) — |m) 
with |m) € M. It is easily shown that E is a closed convex set. Denote the 
unique vector of minimal norm of |w) — M by |u) — | Pu) with |Pu) € M. 
One can show that |w) — |Pu) is orthogonal to |m) for all |m) € M, ice., 
(\u) — |Pu)) € M+ (see Fig. 17.2). Obviously, only the zero vector can 
be simultaneously in M and +. Furthermore, any vector |u) in H can be 
written as |u) = | Pu) +(|u) —|Pu)) with |Pu) € Mand (\u) —|Pu)) € M+. 
This shows that H = M @ M+. In words, a Hilbert space is the direct sum 
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of any one of its subspaces and the orthogonal complement of that subspace. 
The vector | Pu) so constructed is the projection of |u) in M. 

A projection operator P can be defined as a linear operator with the prop- 
erty that P? = P. One can then show the following. 


Theorem 17.6.2 The kernel kerP of a projection operator is the orthogonal 
complement of the range P(H{) of P in 1 iff P is hermitian. 


17.6.1 Compact Hermitian Operator 


We now concentrate on the compact operators, and first look at hermitian 
compact operators. We need two lemmas: 


Lemma 17.6.3 Let H € B(H) be a bounded hermitian operator on the 
Hilbert space H. Then ||H|| = max{|(Hx|x)| | ||x|| = 1}. 


Proof Let M denote the positive number on the RHS. From the definition 
of the norm of an operator, we easily obtain |(Hx|x)| < ||H|| |||]? = ||H]], or 
M < ||H||. For the reverse inequality, see Problem 17.6. 


Lemma 17.6.4 Let K € B(1) be a hermitian compact operator. Then there 
is an eigenvalue X of K such that |A| = ||K\l. 


Proof Let {|x,)} be a sequence of unit vectors such that 
||K|| = lim| (Kxp|xn)]. 


This is always possible, as the following argument shows. Let € be a small 
positive number. There must exist a unit vector |x,) € FC such that 


IKI] — © = |(Kxy|x1) 


’ 


because otherwise, ||K|| — € would be greater than or equal to the norm of 
the operator (see Lemma 17.6.3). Similarly, there must exist another (differ- 
ent) unit vector |x2) € FC such that ||K|| — €/2 = | (Kx2|x2)|. Continuing this 
way, we construct an infinite sequence of unit vectors {|x,)} with the prop- 
erty ||K|| — €/n = |(Kx,|x,)|. This construction clearly produces the desired 
sequence. Note that the argument holds for any hermitian bounded operator; 
compactness is not necessary. 

Now define A, = (Kx, |x,) and let A = limA,, so that |A| = ||K||. Com- 
pactness of K implies that {|Kx,,)} converges. Let |y) € 1 be the limit of 
{|Kxn)}. Then || y|| = lim ||Kxp || < [IKI] |l¢n|] = ||K||. On the other hand, 


0 < |[Kxn — Axn ||? = | Kxn||* — 24(Kxn Xn) + [Al?. 
Taking the limit and noting that 2, and A are real, we get 
0 < lim || Kx, ||? — 2A lim(Kxp|xn) + [AI? = [ly]? — 27 + a? 


=> |lyll? > IKI. 
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It follows from these two inequalities that || y|| = ||K]|| and that lim |x,) = 
|y) /A. Furthermore, 


(K — A1)([y)/A) = (K — A1)(lim|xn)) = lim(K — 41)|xn) = 0 


Therefore, 4 is an eigenvalue of K with eigenvector |y)/A and |A| = ||KI|. 


Arrange all the eigenvalues of Theorem 17.5.11 in the order of decreas- 
ing absolute value. Let J, denote the (finite-dimensional) eigenspace cor- 
responding to eigenvalue A, and P,, the projection to M,,. The eigenspaces 
are pairwise orthogonal and P,P, = 0 for m # n. This follows in exact 
analogy with the finite-dimensional case. 

First assume that K has only finitely many eigenvalues, 


|Ai| = [Ag] =-+- = |A,-| > 0. 
Let 


M=M, O02 6---OM, =>) OM; =QM, 
j=l j=l 


and let Jp be the orthogonal complement of M. Since each eigenspace is 
invariant under K, so is J}. Therefore, by Theorem 6.1.6—which holds for 
infinite-dimensional vector spaces as well—and the fact that K is hermitian, 
Mo is also invariant. Let Kg be the restriction of K to Mp. By Lemma 17.6.4, 
Ko has an eigenvalue 4 such that |A| = ||Ko||. If A 4 0, it must be one of the 
eigenvalues already accounted for, because any eigenvalue of Ko is also an 
eigenvalue of K. This is impossible, because Mp is orthogonal to all the 
eigenspaces. So, 4 = 0, or |A| = ||Ko|| = 0, or Ko = 0, 1.e., K acts as the zero 
operator on Mo. 

Let Pg be the orthogonal projection on Mp. Then H = Yio @M;, and 
we have 1 = a, P ;, and for an arbitrary |x) € 3, we have 


r 


K|x) = K(5719)= DK (P j|x)) =D P j|x)). 


j=0 


It follows that K = }1;_, A;P;. Notice that the range of K is }"";_, @M;, 
which is finite-dimensional. Thus, K has finite rank. Barring some technical 
details, which we shall not reproduce here, the case of a compact hermitian 
operator with infinitely many eigenvalues goes through in the same way (see 
[DeVi 90, pp. 179-180]): 


Theorem 17.6.5 (Spectral Theorem: Compact Hermitian Operators) Let 
K be a compact hermitian operator on a Hilbert space H. Let {i ‘Wey be 
the distinct nonzero eigenvalues of K arranged in decreasing order oF ab- 
solute values. For each j let Mj be the eigenspace of K corresponding to 
eigenvalue i; and P; its projection operator with the property P;P ; =0 for 
i ~ j. Then: 
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1. If N <0, then K is an operator of finite rank, K= am Aj;P;, and 
H=Myo OM @::-O®My, or 1 = ys P;, where Mo is infinite- 
dimensional. 

2. IfN =00, then dr; > 0as j > 0, K= yo AjPj, and H=Mo © 
yi OM;, or 1 = pear, P;, where Mo could be finite- or infinite- 
dimensional. Furthermore, 


m 
K-50 ajP; 
j=l 


= lAm-+1| vm, 


which shows that the infinite series above converges for an operator 
norm. 


The eigenspaces of a compact hermitian operator are orthogonal and, by 
(2) of Theorem 17.6.5, span the entire space. By the Gram—Schmidt process, 
one can select an orthonormal basis for each eigenspace. We therefore have 
the following corollary. 


Corollary 17.6.6 If K is a compact hermitian operator on a Hilbert space 
H, then the eigenvectors of K constitute an orthonormal basis for K. 


Theorem 17.6.7 Let K be a compact hermitian operator on a Hilbert space 
H and let K= ae Aj;Pj;, where N could be infinite. A bounded linear 
operator on H commutes with K if and only if it commutes with every P ;. 


Proof The “if” part is straightforward. So assume that the bounded oper- 
ator T commutes with K. For |x) €¢ Mj, we have (K — A;)T|x) = T(K — 
A j)|x) = 0. Similarly, (K — dj )T |x) = T'(K — A;)|x) = 0, because 0 = 
[T, K]' =[T’, K]. These equations show that both T and T’ leave M; invari- 
ant. This means that M; reduces T, and by Theorem 6.1.8, TP; = PT. 


17.6.2, Compact Normal Operator 


Next we prove the spectral theorem for a normal operator. Recall that 
any operator T can be written as T= X + i¥ where X = 5(T +T') and 
Y= x(T — Ti) are hermitian, and since both T and T’ are compact, X 
and Y are compact as well. For normal operators, we have the extra con- 
dition that [X, ¥] = [T, T'] = 0. Let X= 7"), AjPj and Y= yoy; wxQx 
be the spectral decompositions of X and Y. Using Theorem 17.6.7, it is 
straightforward to show that if [X,Y] = 0 then [P;,Q;] = 0. Now, since 
H= pan eM; = a @Nx, where Mj are the eigenspaces of X and 
Nx those of Y, we have, for any |x) € H, 


N N N WN 
X|x) = os 1) (dram) = >>> SAP jQxlx). 
j=l k=0 


j=l k=0 
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Similarly, 


N N N 
Y|x) =v ris) = DID HK QP j|x). 
j=0 


k=1 j=0 


Combining these two relations and noting that Q,P ; = P; Q gives 


N N 
Tix) =(K+iY) lx) = Y> SA; + ime)PjQlx). 
j=0k=0 


The projection operators P ;Q; project onto the intersection of M; and Nx. 
Therefore, Ml; 1 Nx are the eigenspaces of T. Only those terms in the sum 
for which M; MN; 4 Y contribute. As before, we can order the eigenvalues 
according to their absolute values. 


Theorem 17.6.8 (Spectral Theorem: Compact Normal Operators) Let T 
be a compact normal operator on a Hilbert space H. Let {ij ye (where N 
can be ox) be the distinct nonzero eigenvalues of T arranged in decreasing 
order of absolute values. For each n let My be the eigenspace of T corre- 
sponding to eigenvalue i, and P,, its projection operator with the property 


P,P, =0 form #n. Then: 


1. If N <«@, then T is an operator of finite rank T = ee AjPj, and 
H=Mo 6M @-:-6My, or 1= er Pj, where Mo is infinite- 
dimensional. 

2. IfN =o, then dry > Oasn > 00, T= Or AnPn, and H = Mo ® 
yr OMn, or T= pea P;, where Mo could be finite- or infinite- 
dimensional. 


As in the case of a compact hermitian operator, by the Gram-Schmidt 
process, one can select an orthonormal basis for each eigenspace of a normal 
operator, in which case we have the following: 


Corollary 17.6.9 [fT is a compact normal operator on a Hilbert space KH, 
then the eigenvectors of T constitute an orthonormal basis for 1. 


One can use Theorem 17.6.8 to write any function of a normal operator T 
as an expansion in terms of the projection operators of T. First we note that 
T* has ph as its expansion coefficients. Next, we add various powers of T in 
the form of a polynomial and conclude that the expansion coefficients for a 
polynomial p(T) are p(A,,). Finally, for any function f(T) we have 


fT) =D fAn)Pn. (17.6) 


n=1 


Historical Notes 

Johann (John) von Neumann, (1903-1957), the eldest of three sons of Max von Neu- 
mann, a well-to-do Jewish banker, was privately educated until he entered the gymnasium 
in 1914. His unusual mathematical abilities soon came to the attention of his teachers, 
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who pointed out to his father that teaching him conventional school mathematics would 
be a waste of time; he was therefore tutored in mathematics under the guidance of uni- 
versity professors, and by the age of nineteen he was already recognized as a professional 
mathematician and had published his first paper. 

Von Neumann was Privatdozent at Berlin from 1927 to 1929 and at Hamburg in 1929- 
1930, then went to Princeton University for three years; in 1933 he was invited to join 
the newly opened Institute for Advanced Study, of which he was the youngest permanent 
member at that time. At the outbreak of World War II, von Neumann was called upon to 
participate in various scientific projects related to the war effort: In particular, from 1943 
he was a consultant on the construction of the atomic bomb at Los Alamos. After the 
war he retained his membership on numerous government boards and committees, and in 
1954 he became a member of the Atomic Energy Commission. His health began to fail in 
1955, and he died of cancer two years later. 

It is only in comparison with the greatest mathematical geniuses of history that von Neu- 
mann’s scope in pure mathematics may appear somewhat restricted; it was far beyond the 
range of most of his contemporaries, and his extraordinary work in applied mathematics, 
in which he certainly equals Gauss, Cauchy, or Poincaré, more than compensates for its 
limitations. Von Neumann’s work in pure mathematics was accomplished between 1925 
and 1940, when he seemed to be advancing at a breathless speed on all fronts of logic 
and analysis at once, not to speak of mathematical physics. The dominant theme in von 
Neumann’s work is by far his work on the spectral theory of operators in Hilbert spaces. 
For twenty years he was the undisputed master in this area, which contains what is now 
considered his most profound and most original creation, the theory of rings of operators. 
The first papers (1927) in which Hilbert space theory appears are those on the foundations 
of quantum mechanics. These investigations later led von Neumann to a systematic study 
of unbounded hermitian operators. 

Von Neumann’s most famous work in theoretical physics is his axiomatization of quantum 
mechanics. When he began work in that field in 1927, the methods used by its founders 
were hard to formulate in precise mathematical terms: “Operators” on “functions” were 
handled without much consideration of their domain of definition or their topological 
properties, and it was blithely assumed that such “operators”, when self-adjoint, could 
always be “diagonalized” (as in the finite dimensional case), at the expense of introducing 
Dirac delta functions as “eigenvectors”. Von Neumann showed that mathematical rigor 
could be restored by taking as basic axioms the assumptions that the states of a physical 
system were points of a Hilbert space and that the measurable quantities were Hermitian 
(generally unbounded) operators densely defined in that space. 

After 1927 von Neumann also devoted much effort to more specific problems of quantum 
mechanics, such as the problem of measurement and the foundation of quantum statis- 
tics and quantum thermodynamics, proving in particular an ergodic theorem for quan- 
tum systems. All this work was developed and expanded in Mathematische Grundlagen 
der Quantenmechanik (1932), in which he also discussed the much-debated question of 
“causality” versus “indeterminacy” and concluded that no introduction of “hidden param- 
eters” could keep the basic structure of quantum theory and restore “causality”. 

Von Neumann’s uncommon grasp of applied mathematics, treated as a whole without 
divorcing theory from experimental realization, was nowhere more apparent than in his 
work on computers. He became interested in numerical computations in connection with 
the need for quick estimates and approximate results that developed with the technology 
used for the war effort—particularly the complex problems of hydrodynamics—and the 
completely new problems presented by the harnessing of nuclear energy, for which no 
ready-made theoretical solutions were available. Von Neumann’s extraordinary ability for 
rapid mental calculation was legendary. The story is told of a friend who brought him a 
simple kinematics problem. Two trains, a certain given distance apart, move toward each 
other at a given speed. A fly, initially on the windshield of one of the trains, flies back 
and forth between them, again at a known constant speed. When the trains collide, how 
far has the fly traveled? One way to solve the problem is to add up all the successively 
smaller distances in each individual flight. (The easy way is to multiply the fly’s speed by 
the time elapsed until the crash.) After a few seconds of thought, von Neumann quickly 
gave the correct answer. 

“That’s strange,” remarked his friend, “Most people try to sum the infinite series.” 
“What’s strange about that?” von Neumann replied. “That’s what I did.” 
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In closing this section, let us remark that the paradigm of compact op- 
erators, namely the Hilbert-Schmidt operator, is such because it is defined 
on the finite rectangle [a, b] x [a, b]. If this rectangle grows beyond limit, 
or equivalently, if the Hilbert space is L£?(Roo), where Roo is some infinite 
region of the real line, then the compactness property breaks down, as the 
following example illustrates. 


Example 17.6.10 Consider the two kernels 
Ki(x,t)=e "and Ko(x,t) = sinxt 


where the first one acts on £7(—oo, 00) and the second one on £7(0, 00). 
One can show (see Problem 17.7) that these two kernels have, respectively, 
the two eigenfunctions 


‘ “4 t 
eb q@eR, and f—e + ee SU, 
2 a-+t 


corresponding to the two eigenvalues 


2 bs 
= quate aeéR, and A= z" 


We see that in the first case, all real numbers between 0 and 2 are eigenval- 
ues, rendering this set uncountable. In the second case, there are infinitely 
(in fact, uncountably) many eigenvectors (one for each a) corresponding to 
the single eigenvalue ./z/2. Note, however, that in the first case the eigen- 
functions and in the second case the kernel have infinite norms. 


17.7. Resolvents 


The discussion of the preceding section showed that the spectrum of a nor- 
mal compact operator is countable. Removing the compactness property in 
general will remove countability, as shown in Example 17.6.10. We have 
also seen that the right-shift operator, a bounded operator, has uncountably 
many points in its spectrum. We therefore expect that the sums in Theo- 
rem 17.6.8 should be replaced by integrals in the spectral decomposition 
theorem for (noncompact) bounded operators. We shall not discuss the spec- 
tral theorem for general operators. However, one special class of noncom- 
pact operators is essential for the treatment of Sturm-Liouville theory (to be 
studied in Chap. 19). For these operators, the concept of resolvent will be 
used, which we develop in this section. This concept also makes a connec- 
tion between the countable (algebraic) and the uncountable (analytic) cases. 


Definition 17.7.1 Let T be an operator and 4 € e(T). The operator R, (T) = 
(T —A1)~! is called the resolvent of T at A. 


17.7 Resolvents 


Two important properties of the resolvent are useful in analyzing the 
spectrum of operators. Let us assume that A, w € p(T), A ~ pw, and take 
the difference between their resolvents. Problem 17.8 shows how to obtain 
the following relation: 


R,,(T) —R,,(T) = (A — wR, (T)R, (T). (17.7) 


To obtain the second property of the resolvent, we formally (and indefi- 
nitely) differentiate R,(T) with respect to 4 and evaluate the result at A = jw: 


faa = Liq —A1) 7!) =(T-a1) 7 =RRD). 

dr dr 

Differentiating both sides of this equation, we get 2R} (T), and in general, 
d” d” 
RD) =n) > aR), =nIR"*"(T), 


Assuming that the Taylor series expansion exists, we may write 


oo A—u) d oO 
Rat) = oT aA], = Aw" RN, (17.8) 


which is the second property of the resolvent. 

We now look into the spectral decomposition from an analytical view- 
point. For convenience, we concentrate on the finite-dimensional case and 
let A be an arbitrary (not necessarily hermitian) N x N matrix. Let A be a 
complex number that is larger (in absolute value) than any of the eigenval- 
ues of A. Since all operators on finite-dimensional vector spaces are compact 
(by Proposition 17.5.5), Lemma 17.6.4 assures us that |A| > ||T||, and it is 
then possible to expand Ry (T) in a convergent power series as follows: 


(ee) 


_ 1 A n 
R,(A) = (A—A1) '=-2>-(F) . (17.9) 


This is the Laurent expansion of R,(A). We can immediately read off the 
residue of R, (A) (the coefficient of 1/2): 


ResfRAJ=—-1 > —s of RAA=1, 
ni Tr 


where I” is a circle with its center at the origin and a radius large enough to 
encompass all the eigenvalues of A [see Fig. 17.3(a)]. A similar argument 
shows that 


1 
-=-o AR, (A) dA =A, 
2Q0i T 


and in general, 


1 
——_ @ A"Ry(A)dA =A" forn=0,1,... 
2m r 
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(a) (b) 


Fig. 17.3 (a) The large circle encompassing all eigenvalues. (b) the deformed contour 
consisting of small circles orbiting the eigenvalues 


Using this and assuming that we can expand the function f(A) in a power 
series, we get 


1 
-=6 FAR, (A) da = f(A). (17.10) 
201 fd 
Writing this equation in the form 
i fA) 
= di= f(A 
2mi Jr A1—-A FA) 


makes it recognizable as the generalization of the Cauchy integral formula 
to operator-valued functions. 

To use any of the above integral formulas, we must know the analytic 
behavior of R,(A). From the formula of the inverse of a matrix given in 
Chap. 5, we have 


ee ee 
[Ri(A)] , =[(A-1) lig Gagne = P(A) ” 


where Cj, (A) is the cofactor of the ijth element of the matrix A — A1 and 
p(A) is the characteristic polynomial of A. Clearly, Cj, (A) is also a polyno- 
mial. Thus, [Rj (A)]j< is a rational function of A. It follows that R, (A) has 
only poles as singularities (see Proposition 11.2.2). The poles are simply 
the zeros of the denominator, i.e., the eigenvalues of A. We can deform the 
contour J” in such a way that it consists of small circles y; that encircle the 
isolated eigenvalues 4; [see Fig. 17.3(b)]. Then, with f(A) = 1, Eq. (17.10) 
yields 


1 r r 
1=—-— i) Ri (A)dk = °P;, Pj 
2ni Pat ai 


1 
-—f R,(A) dd. 
201 vj 


(17.11) 
It can be shown (see Example 17.7.2 below) that {Pj} is a set of orthog- 
onal projection operators. Thus, Eq. (17.11) is a resolution of identity, as 
specified in the spectral decomposition theorem in Chap. 6. 
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Example 17.7.2 We want to show that the P; are projection operators. First 


let i= j. Then? 
2 1 \* 
‘a Ry(A)dAD Ry (A) du. 
wi vj v5 


Note that A need not be equal to jz. In fact, we are free to choose |A — 4;| > 
| —A,;|, Le., let the circle corresponding to A integration be outside that of 
yw integration.!° We can then rewrite the above double integral as 


1 2 
2 =, 
i= ( =) $n Pw R,(A)R,,(A) dadu 
J i. 
iA? Ri(A) RA 
“(an) ff et -Pet 
210i y ye A-w A-—p 
ae d 
-(-=) \f Ri(A) dap a. 
201i We ye A-—p 
$ R,,(A) d $ aA 
= LD zh 
yi) yO A= HE 


where we used Eq. (17.7) to go to the second line. Now note that 


d dx 
$ fa se and $ —— =27i 
yi A pb y® A 


because A lies outside ye and ju lies inside oe . Hence, 


1? 1 
2 f — . bee oo oe = Pp. 
z= ( =) {0 2m f R.(Addu} = wi fu Ry (A) du =P}. 
J J 


The remaining part, namely P;P, = 0 for k 4 j, can be done similarly (see 
Problem 17.9). 


Now we let f(A) =A in Eq. (17.10), deform the contour as above, and 
write 


_— 
A=-—-— AR, (A) dx 
me nA) 
j=l J 


1 r 
=-5 lug RulA)dn + G.—2/)RA(A) Aa 
j=l vj Vj 


°We have not discussed multiple integrals of complex functions. A rigorous study of such 
integrals involves the theory of functions of several complex variables—a subject we 
have to avoid due to lack of space. However, in the simple case at hand, the theory of real 
multiple integrals is an honest guide. 


!0This is possible because the poles are isolated. 
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- 
=) (ajP; +D)), = (A —A;)R,(A) da. (17.12) 
j=l Yj 
It can be shown (see Problem 17.10) that 
Di = (A —2;)"Ry(A) da. 
Yj 


In particular, since Ra (A) has only poles as singularities, there exists a pos- 
itive integer m such that D’”’ = 0. We have not yet made any assumptions 
about A. If we assume that A is hermitian, for example, then R,(A) will 
have simple poles (see Problem 17.11). It follows that (A — 4;)R,(A) will 
be analytic at A; for all j =1,2,...,r, and Dj = 0 in Eq. (17.12). We thus 
have 


: 
A= piers 
j=l 


which is the spectral decomposition discussed in Chap. 6. Problem 17.12 
shows that the P; are hermitian. 


Example 17.7.3 The most general 2 x 2 hermitian matrix is of the form 
a a 
es | i te 
42 422 
where a1, and a2 are real numbers. Thus, 


det(A — 41) =A? — (any +ar2)A + ayax — |a12/" 


which has roots 


1 
y= 5 lai + a22 — Jan — a2)? +4lay2/], 


1 
a= slau af age Jan — an2)* + 4ai2|"]. 


The inverse of A — A1 can immediately be written: 


- 1 ay227-h = —ay2 
R,(A) = (A—A1)7! = ———__ 
a= = BA I) ( Se ee 


_ 1 (er 7) 
(A— Ag — Aa) a, aay” 


We want to verify that R,(A) has only simple poles. Two cases arise: 


1. IfAy # Ad, then it is clear that Rj (A) has simple poles. 

2. IfAy =Ag, it appears that R, (A) may have a pole of order 2. However, 
note that if 71 = A2, then the square roots in the above equations must 
vanish. This happens iff aj; = a22 =a and aj2 = 0. It then follows that 
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dy =A2 =a, and 


1 a-i 0 
R, (A) = ——— F 
(A) aa ( 0 a 
This clearly shows that R, (A) has only simple poles in this case. 


If A is not hermitian, D j + 0; however, D j is nevertheless nilpotent (see 
Definition 3.5.1). This property and Eq. (17.12) can be used to show that 
A can be cast into a Jordan canonical form via a similarity transformation. 
That is, there exists an N x N matrix S such that 


Jj O OO... O 

i 0 J OO... O 

SAS =J=]. a) 8 : 

0 O 0 Jn 

where J; is a matrix of the form Jordan canonical form 
A 1 0 0 0 0 
0 2» 1 +0 0 0 
0 0A 1 0 O 
Je= 

0 00 0... 424 1 
0000... 0 4 


in which A is one of the eigenvalues of A. Different J; may contain the same 
eigenvalues of A. For a discussion of the Jordan canonical form of a matrix, 
see [Birk 77], [Denn 67], or [Halm 58]. 


17.8 Problems 


17.1 Suppose that S is a bounded operator, T an invertible operator, and that 


\|T — S|] < —_. 
|T" 


Show that $ is invertible. Hint: Show that T~'S is invertible. Thus, an oper- 
ator that is “sufficiently close” to an invertible operator is invertible. 


17.2 Let V and W be finite-dimensional vector spaces. Show that T € 
L(V, W) is necessarily bounded. 


17.3 Let H be a Hilbert space, and T € £(K) an isometry, i.e., a linear 
operator that does not change the norm of any vector. Show that ||T|| = 1. 


17.4 Show that 


(a) the unit operator is not compact, and that 
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(b) the inverse of a compact operator cannot be bounded. 


Hint: For (b) use the results of Example 17.5.3. 


17.5 Let |u) € H and let M be a subspace of IH. Show that the subset EF = 
|u) — M is convex. Show that E is not necessarily a subspace of H. 


17.6 Show that for any hermitian operator H, we have 
4(Hx|y) = (H(@& + y)|x + y) — (H@ — y)|x — y) 
oF if (H@ +iy)|x + iy) — (A(x — iy)|x — iy)]. 


Now let |x) = Az) and |y) = |Hz)/A, where 4 = (||Hzl|/||zI|)!/2, and show 
that 


I|Hz||* = (Hx|y) < M|[zi|{IHzil, 
where M = max{|(Hz|z)|/|||zl]?}. Now conclude that ||H|| <M. 
17.7 Show that the two kernels Kj (x,t) = e~*—1l and Ko(x,t) = sinxt, 


where the first one acts on £7(—o0o, 00) and the second one on £7(0, 00), 
have the two eigenfunctions 


; 8 t 
ee q@eéeR, and Pe ae aS a>O, 
2 a-+t 


respectively, corresponding to the two eigenvalues 


2 [7 
a oe aeéR, and A= 2" 


17.8 Derive Eq. (17.7). Hint: Multiply R,(T) by 1 = R,,(1)(T — 1) and 
R,,(T) by 1 =R,(T)(T — A1). 


17.9 Finish Example 17.7.2 by showing that P;P; =0 fork ¥ j. 


17.10 Show that Di a f,, (A — 2;)"R,(A) dd. Hint: Use mathematical in- 
J j / 
duction and the technique used in Example 17.7.2. 


17.11 (a) Take the inner product of |w) = (A—/1)|v) with |v) and show that 


for a hermitian A, Im(v|u) = —(ImA)||v||7. Now use the Schwarz inequality 
to obtain 
ic [Ran] <“". 
~ [Im A| ~ [Im A| 


(b) Use this result to show that 


Re(A—2,) 


(A —A,)Ra(A)|u) || < (1 + roo 


) ta = (1+ |cotd|) |lull, 


where @ is the angle that A — 1; makes with the real axis and A is chosen to 
have an imaginary part. From this result conclude that R, (A) has a simple 
pole when A is hermitian. 


17.8 Problems 


17.12 (a) Show that when A is hermitian, [R,,(A)]* = R)«(A). 

(b) Write 4 —Aj = rjel® in the definition of P; in Eq. (17.11). Take the 
hermitian conjugate of both sides and use (a) to show that P; is hermitian. 
Hint: You will have to change the variable of integration a number of times. 
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The beginning of Chap. 17 showed that to solve a vector-operator equation 
one transforms it into an equation involving a sum over a discrete index [the 
matrix equation of Eq. (17.1)], or an equation involving an integral over 
a continuous index [Eq. (17.2)]. The latter is called an integral equation, 
which we shall investigate here using the machinery of Chap. 17. 


18.1. Classification 


Integral equations can be divided into two major groups. Those that have 
a variable limit of integration are called Volterra equations; those that 
have constant limits of integration are called Fredholm equations. If the 
unknown function appears only inside the integral, the integral equation is 
said to be of the first kind. Integral equations having the unknown function 
outside the integral as well as inside are said to be of the second kind. The 
four kinds of equations can be written as follows. 


x 
( K(x, t)u(t) dt = v(x), Volterra equation of the Ist kind, 
a 
b 
; K(x, thu(t) dt = v(x), Fredholm equation of the Ist kind, 
a 
x 
u(x) = v(x) + / K(x,t)u(t)dt, Volterra equation of the 2nd kind, 
a 


b 
u(x) = v(x) + : K(x,t)u(t)dt, Fredholm equation of the 2nd kind. 
a 
In all these equations, K (x, t) is called the kernel of the integral equation. 
In the theory of integral equations of the second kind, one usually mul- 
tiplies the integral by a nonzero complex number A. Thus, the Fredholm 
equation of the second kind becomes 


b 
u(x) =v) +2 f K(x, t)u(t) dt, (18.1) 
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and for the Volterra equation of the second kind one obtains 


x 
u(x) = v(x) +2 | K(x, t)u(t) dt. (18.2) 
a 
A A that satisfies (18.1) or (18.2) with v(x) = 0 is called a characteristic 
value of the integral equation. In the abstract operator language both equa- 
tions are written as 


ju) =|v) +AK|u) = (K—A7'1)Ju) =A |v). (18.3) 


Thus A is a characteristic value if and only if A~! is an eigenvalue of K. 
Recall that when the interval of integration (a, b) is finite, K (x, t) is called 
a Hilbert-Schmidt kernel. Example 17.5.10 showed that K is a compact op- 
erator, and by Theorem 17.5.11, the eigenvalues of K either form a finite set 
or a sequence that converges to zero. 


Theorem 18.1.1 The characteristic values of a Fredholm equation of the 
second kind either form a finite set or a sequence of complex numbers in- 
creasing beyond limit in absolute value. 


Our main task in this chapter is to study methods of solving integral equa- 
tions of the second kind. We treat the Volterra equation first because it is 
easier to solve. Let us introduce the notation 


Kus fo K(x,t)u(t)dt and K"[ul(x) = K[K"~![u]](x) 


(18.4) 
whereby K[u] denotes a function whose value at x is given by the integral 
on the RHS of the first equation in (18.4). One can show with little difficulty 
that the associated operator K is compact. Let M = max{|K(x,t)||a<t< 
x < b} and note that 


|AK [w](x)| = bf K(x, t)u(t) dt 


SIAIM || loo (x — a), 


where ||z||o0 = max{|u(x)| | x € (a, b)}. 
Using mathematical induction, one can show that (see Problem 18.1) 


(x — a)” 


|(AK)"[u](x)| <1Al"|MI" ello 
ny 


(18.5) 
Since b > x, we can replace x with b and still satisfy the inequality. Then 
the inequality of Eq. (18.5) will hold for all x, and we can write the equa- 
tion as an operator norm inequality: || (AK)” || < |A|”|M|" ||ulloo(b — a)" /n!. 
Therefore, 


dak)" 


n=0 


eS Jaw] Me = 0" _ wnio-0, 


n! 
n=0 n=0 
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and the series yg GK)" converges for all 4. In fact, a direct calculation 
shows that the series converges to the inverse of 1 — AK. Thus, the latter is 
invertible and the spectrum of K has no nonzero points. We have just shown 
the following. 


Theorem 18.1.2 The Volterra equation of the second kind has no nonzero 
characteristic value. In particular, the operator 1 — AK is invertible, the 
equation always has a unique solution given by the convergent infinite series 


ney= > 4 , K/(x, t)u(t) dt 
j=o “4 


where K/ (x, t) is defined inductively in Eq. (18.4). 


Historical Notes 

Vito Volterra (1860-1940) was only 11 when he became interested in mathematics while 
reading Legendre’s Geometry. At the age of 13 he began to study the three body problem 
and made some progress. 

His family were extremely poor (his father had died when Vito was two years old) but 
after attending lectures at Florence he was able to proceed to Pisa in 1878. At Pisa he stud- 
ied under Betti, graduating as a doctor of physics in 1882. His thesis on hydrodynamics 
included some results of Stokes, discovered later but independently by Volterra. 

He became Professor of Mechanics at Pisa in 1883, and upon Betti’s death, he occupied 
the chair of mathematical physics. After spending some time at Turin as the chair of 
mechanics, he was awarded the chair of mathematical physics at the University of Rome 
in 1900. 

Volterra conceived the idea of a theory of functions that depend on a continuous set of val- 
ues of another function in 1883. Hadamard was later to introduce the word “functional”, 
which replaced Volterra’s original terminology. In 1890 Volterra used his functional cal- 
culus to show that the theory of Hamilton and Jacobi for the integration of the differential 
equations of dynamics could be extended to other problems of mathematical physics. 
His most famous work was done on integral equations. He began this study in 1884, and 
in 1896 he published several papers on what is now called the Volterra integral equation. 
He continued to study functional analysis applications to integral equations producing a 
large number of papers on composition and permutable functions. 

During the First World War Volterra joined the Air Force. He made many journeys to 
France and England to promote scientific collaboration. After the war he returned to the 
University of Rome, and his interests moved to mathematical biology. He studied the 
Verhulst equation and the logistic curve. He also wrote on predator—prey equations. 

In 1922 Fascism seized Italy, and Volterra fought against it in the Italian Parliament. 
However, by 1930 the Parliament was abolished, and when Volterra refused to take an 
oath of allegiance to the Fascist government in 1931, he was forced to leave the University 
of Rome. From the following year he lived mostly abroad, mainly in Paris, but also in 
Spain and other countries. 


Differential equations can be transformed into integral equations. For in- 
stance, consider the SOLDE 


du du j 
ve) z+ Pix) + polx)u = r(x), u(a) =C}, u'(a) =C2. (18.6) 
x x 


By integrating the DE once, we obtain 


a = -| pi(t)u'(t) dt — [ po(t)u(t) dt +f r(t)dt + ¢9. 


a 
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Integrating the first integral by parts gives 


u! (x) = —pi(x)u(x) + / [p) (t) — po(t) u(t) at 
daw 
=f (x) 
+ : ‘Oana. 
Ja 


=8(x) 


Integrating once more yields 


ucy=- f puducaar + f fooyas+ [ a(s)ds 
+ (x —a)[pila)ei + 2] 


--| puede + f as [ [pi (t) — po(t) ]u(t) dt 
+f as [ r(t)dt + (x —a)[pi@c te] +e1 
=i {(x — [pi — polt)] — pil) }ut) at 


+ [ (x —t)r(t)dt+(«-a)[pilac1+c2]+e1, (18.7) 


where we have used the formula 


[af roas Po-opoa, 


which the reader may verify by interchanging the order of integration on the 
LHS. 


Proposition 18.1.3 A SOLDE of the form (18.6) is equivalent to a Volterra 
equation of the second kind with kernel 


K(x.) = (x —9)[P}O — pol)] -— rr 
and 
x 
v(x) = i (x —t)r(t)dt+ (x — a)| pi(a)ci + co] +c}. 
a 

Neumann series solution We now outline a systematic approach to obtaining the infinite series of 

Theorem 18.1.2, which also works for the Fredholm equation of the sec- 

ond kind. In the latter case, the series is guaranteed to converge only if 


|A|||K|| < 1. This approach has the advantage that in each successive step, 
we obtain a better approximation to the solution. Writing the equation as 


|u) = |v) +AK|u), (18.8) 
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we can interpret it as follows. The difference between |u) and |v) is AK|u). 
If AK were absent, the two vectors |v) and |v) would be equal. The effect 
of AK is to change |) in such a way that when the result is added to |v), 
it gives |u). As our initial approximation, therefore, we take |u) to be equal 
to |v) and write |vo) = |v), where the index reminds us of the order (in 
this case zeroth, because AK = 0) of the approximation. To find a better 
approximation, we always substitute the latest approximation for |) in the 
RHS of Eq. (18.8). At this stage, we have |u1) = |v) +AK|uo) = |v) +AK|v). 
Still a better approximation is achieved if we substitute this expression in 
(18.8): 


|u2) = |v) + AK|w1) = |v) + AK(|v) + AKlv)) = |v) + AK|v) + 27K? |v). 


The procedure is now clear. Once |uv,,), the nth approximation, is obtained, 
we can get |uv,+1) by substituting in the RHS of (18.8). 

Before continuing, let us write the above equations in integral form. In 
what follows, we shall concentrate on the Fredholm equation. To obtain the 
result for the Volterra equation, one simply replaces b, the upper limit of 
integration, with x. The first approximation can be obtained by substituting 
u(t) for u(t) on the RHS of Eq. (18.1). This yields 


b 
uy(x) = v(x) +2 K(x, t)v(t) dt. 


a 


Substituting this back in Eq. (18.1) gives 


b 
u2(x) = v(x) + | ds K (x, s)u,(s) 


a 


b 
=v(x)+ if ds K (x, s)vu(s) 


b b 
+72 f ai | KG.) K(.ds oto 


b b 
= vex) +2 f arK(x,nu(y+2 | dtK7(x, t)v(t), 


a 


where K?(x,t) = i. K (x, s) K(s, t) ds. Similar expressions can be derived 
for u3(x), u4(x), and so forth. The integrals expressing various “powers” of 
K can be obtained using Dirac notation and vectors with continuous indices, 
as discussed in Sect. 7.3. Thus, for instance, 


b b 
K3 (x,t) = (xIK (/ si(o11ds) K (/ Iso) (sa as.) Kir) 
eS Choe 


=1 =1 


b b 
- / i : dso(x|Kls1) (s1|K]s2) (s2IK|z) 


b b 
=I as [ ds7K (x, s1)K (s}, 82) K (so, t). 
a a 
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We can always use this technique to convert an equation in kets into an 
equation in functions and integrals. Therefore, we can concentrate on the 
abstract operator equation and its various approximations. 

Continuing to the nth-order approximation, we easily obtain 


Juin) = |v) + AK Iv) +--+ "K" |v) = SAK) |v), (18.9) 
j=0 


whose integral form is 
n ; b ; 
unto = om f K/ (x, thu(t) dt. (18.10) 
j=o *" 


Here K/(x, t) is defined inductively by 


K(x, t) = (x|K°|t) = (x/11t) = (x|t) = 8 — 1), 


b 
K1(x,1) = (x|KKI™"1) = wik( | Is)(s! as)k/—'in 


b 
= K(x, s)K/—~\(s, t) ds. 


The limit of u,(x) as n > o0 gives 
u(x) = yw f K/ (x, t)u(t) dt. (18.11) 
i=0 a 


The convergence of this series, called the Neumann series, is always guar- 
anteed for the Volterra equation. For the Fredholm equation, we need to 
impose the extra condition |A|||K|| < 1. 


Example 18.1.4 As an example, let us find the solution of u(x) = 1+ 
Xr de u(t) dt, a Volterra equation of the second kind. Here, v(x) = | and 
K (x, t) = 1, and it is straightforward to calculate approximations to u(x): 


uo(x) = v(x) = 1, uj(x)=1 +f K(x, t)hug(t)dt =1+dx, 
0 


x 


u2(x) = 142 f K(x, pu ndr= 142 [a +anar 
0 0 


2x2 


It is clear that the nth term will look like 


1x2 Nx” As xd 
malar 2 po ME Ly EE 


As n — 00, we obtain u(x) = e**. By direct substitution, it is readily 
checked that this is indeed a solution of the original integral equation. 


18.2 Fredholm Integral Equations 549 
18.2 Fredholm Integral Equations 


We can use our knowledge of compact operators gained in the previous 
chapter to study Fredholm equations of the second kind. With A 4 0 a com- 
plex number, we consider the characteristic equation 


(1—AK)|uv) =|v), or u(x) —AK[u](x) = v(x), (18.12) 


where all functions are square-integrable on [a, b], and K (x, f), the Hilbert- 
Schmidt kernel, is square-integrable on the rectangle [a, b] x [a, b]. 

Using Proposition 17.2.9, we immediately see that Eq. (18.12) has a 
unique solution if |A|||K|| < 1, and the solution is of the form 


Ju) = (1 — AK)! |v) = > A"K" |v), (18.13) 
n=0 


or u(x) = par A" K"[v](x), where K”[v](x) is defined as in Eq. (18.4) 
except that now b replaces x as the upper limit of integration. 


Example 18.2.1 Consider the integral equation 


: x if0<x<t, 
u(x) — K(x,t)u(t)dt=x, where K(x,t)= ; 
0 t ift<x<l. 


Here 4 = 1; therefore, a Neumann series solution exists if ||/K|| < 1. It is 
convenient to write K in terms of the theta function:! 


K (x,t) =x0(t —x) +10(x —1). (18.14) 


This gives | K (x, t)|? = x?0(t—x) +176 (x —1) because 07(x —t) = 0(x —f) 
and 6(x — t)0(t — x) = 0. Thus, we have 


1 1 
iKi?= [ ax | dt|K(x,t)|" 
0 0 
1 1 1 1 
= ax | Po0—x)dr+ f ax | t?0(x —t)dt 
0 0 0 0 
1 t 1 x 
=, ar | Pax f ax | t? dt 
0 0 0 0 
= fia i + fa a a. 
~ Jo 3 ae aes 


Since this is less than 1, the Neumann series converges, and we have 


‘Recall that the theta function is defined to be 1 if its argument is positive, and 0 if it is 
negative. 

2Note that in this case (Fredholm equation), we can calculate the jth term in isolation. In 
the Volterra case, it was more natural to calculate the solution up to a given order. 
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ioe) b ioe) 1 ioe) 
u(x) = yu f Ki (x, t)u(t) dt = | Ki (x, n)tdt=)° f(x). 
a jo"? j=0 
The first few terms are evaluated as follows: 
1 1 
fo(x) = K(x, t)t dt =i d(x, t)tdt =x 
0 0 
1 1 
fc= f K(,prdr= f [xO(t —x) +t0(x —1)]t dt 
0 0 


1 * x x3 
=e) rar f t-dt=-~—-——. 
x 0 2 6 


The next term is trickier than the first two because of the product of the 
theta functions. We first substitute Eq. (18.14) in the integral for the second- 
order term, and simplify 


1 1 1 
fe [ Reanedr= [ rar f K (x,s)K(s,t)ds 
0 0 0 
1 1 
=f rr f [xO(s — x) +s6(x —s)][s0(t — s) + 16(s —t)| ds 
0 0 
1 1 
=x| rar f sO(s —x)O(t —s)ds 
0 0 
1 1 
+x f Par [ 6(s —x)0(s —t)ds 
0 0 
1 1 
+f rat [ s°0(x — s)0(t —s)ds 
0 0 


1 1 
+f Par [ s0(x — s)0(s —t)ds. 
0 0 


It is convenient to switch the order of integration at this point. This is be- 
cause of the presence of 0(x — s) and @(s — x), which do not involve ¢ and 
are best integrated last. Thus, we have 


1 1 1 5 
fx) =x | s0(s—x)ds f rdr+x | a(s—ayas [ t? dt 
0 Ss 0 0 
1 1 1 s 
+f a(x —s)ds f rar f s0(x~s)ds f t? dt 
0 Ss 0 0 
1 1 s2 1 5? x 4 1 s2 
=x sds|=-— +xf as + f Pas(5-5) 
I (; >) x 3 0 2 2 
x 9 
ds_— 
+f 5 53 
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As a test of his/her knowledge of 6-function manipulation, the reader is 
urged to perform the integration in reverse order. Adding all the terms, we 
obtain an approximation for u(x) that is valid for 0 < x <1: 


41 1 , 1 5 
u(x) © fox) + fi) + fax) = ax ria + x. 


We have seen that the Volterra equation of the second kind has a unique 
solution which can be written as an infinite series (see Theorem 18.1.2). 
The case of the Fredholm equation of the second kind is more complicated 
because of the existence of eigenvalues. The general solution of Eq. (18.12) 
is discussed in the following: 


Theorem 18.2.2 (Fredholm Alternative) Let K be a Hilbert-Schmidt oper- 
ator and x a complex number. Then either 


1. 4 is a regular value of Eq. (18.12)—or 47! is a regular point of the 
operator K—in which case the equation has the unique solution \u) = 
(1 —AK)~!|v), or 

2. 4 is a characteristic value of Eq. (18.12) (A7! is an eigenvalue of the 
operator K), in which case the equation has a solution if and only if |v) 
is in the orthogonal complement of the (finite-dimensional) null space 
of 1— A*K’. 


Proof The first part is trivial if we recall that by definition, regular points of 
K are those complex numbers jz which make the operator K — 21 invertible. 

For part (2), we first show that the null space of 1 — A*K" is finite- 
dimensional. We note that 1 — AK is invertible if and only if its adjoint 
1 — A*K' is invertible, and A € p(K) iff A* € p(K'). Since the spectrum 
of an operator is composed of all points that are not regular, we conclude 
that A is in the spectrum of K if and only if A* is in the spectrum of K’. 
For compact operators, all nonzero points of the spectrum are eigenvalues. 
Therefore, the nonzero points of the spectrum of K', a compact operator by 
Theorem 17.5.7, are all eigenvalues of K", and the null space of 1 — A*K" is 
finite-dimensional (Theorem 17.5.11). Next, we note that the equation itself 
requires that |v) be in the range of the operator 1 — AK, which, by Theo- 
rem 17.6.5, is the orthogonal complement of the null space of 1 — A*K’. 


Historical Notes 

Erik Ivar Fredholm (1866-1927) was born in Stockholm, the son of a well-to-do mer- 
chant family. He received the best education possible and soon showed great promise in 
mathematics, leaning especially toward the applied mathematics of practical mechanics 
in a year of study at Stockholm’s Polytechnic Institute. Fredholm finished his education 
at the University of Uppsala, obtaining his doctorate in 1898. He also studied at the Uni- 
versity of Stockholm during this same period and eventually received an appointment to 
the faculty there. Fredholm remained there the rest of his professional life. 

His first contribution to mathematics was contained in his doctoral thesis, in which he 
studied a first-order partial differential equation in three variables, a problem that arises 
in the deformation of anisotropic media. Several years later he completed this work by 
finding the fundamental solution to a general elliptic partial differential equation with 
constant coefficients. 


Fredholm alternative 


Erik lvar Fredholm 
1866-1927 
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Fredholm is perhaps best known for his studies of the integral equation that bears his 
name. Such equations occur frequently in physics. Fredholm’s genius led him to note the 
similarity between his equation and a relatively familiar matrix-vector equation, resulting 
in his identification of a quantity that plays the same role in his equation as the deter- 
minant plays in the matrix-vector equation. He thus obtained a method for determining 
the existence of a solution and later used an analogous expression to derive a solution to 
his equation akin to the Cramer’s rule solution to the matrix-vector equation. He further 
showed that the solution could be expressed as a power series in a complex variable. This 
latter result was considered important enough that Poincaré assumed it without proof (in 
fact he was unable to prove it) in a study of related partial differential equations. 
Fredholm then considered the homogeneous form of his equation. He showed that under 
certain conditions, the vector space of solutions is finite-dimensional. David Hilbert later 
extended Fredholm’s work to a complete eigenvalue theory of the Fredholm equation, 
which ultimately led to the discovery of Hilbert spaces. 


18.2.1 Hermitian Kernel 


Of special interest are integral equations in which the kernel is hermitian, 
which occurs when the operator is hermitian. Such a kernel has the property 
that? (x|K|t)* = (t|K|x) or [K (x, t)]* = K(t, x). For such kernels we can 
use the spectral theorem for compact hermitian operators to find a series 
solution for the integral equation. First we recall that 


mj 


K= yA p= Soa Dilek a a 


j=l j=l 


’ 


where we have used he to denote the eigenvalue of the operator* and ex- 
panded the projection operator in terms of orthonormal basis vectors of the 
corresponding finite-dimensional eigenspace. Recall that N can be infinity. 
Instead of the double sum, we can sum once over all the basis vectors and 
write K = baa Mn lun) (uy|. Here n counts all the orthonormal eigenvec- 
tors of the Hilbert space, and A, ' is the eigenvalue corresponding to the 
eigenvector |u,). Therefore, A,- ' may be repeated in the sum. The action of 
K on a vector |) is given by 


(oe) 


K\u) = > Ag! (un|u)|un). (18.15) 


n=1 
If the Hilbert space is £7[a, b], we may be interested in the functional form 
of this equation. We obtain such a form by multiplying both sides by (x|: 


ee) 


K[u](x) = (xIK|u) = Sas! (un |) (x|itn) = D0 diy | (Un lt) ttn (x). 


n=1 n=1 


That this series converges uniformly in the interval [a, b] is known as the 
Hilbert-Schmidt theorem. 


3Since we are dealing mainly with real functions, hermiticity of K implies the symmetry 
of K,i.e., K (x,t) = K(t,x). 


44; is the characteristic value of the integral equation, or the inverse of the eigenvalue of 
the corresponding operator. 
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Example 18.2.3 Let us solve u(x) = x + AS? K(x, tu(t)dt, where 


K (x,t) = xt is a symmetric (hermitian) kernel, by the Neumann series 
method. We note that 


b b b b 
iKi?= [ if Kes. Paxar= [ / x°t?dx dt 
a a a a 
b b b 2 1 9 
= as | tdt= (/ ax) — —(b° —a°) ; 
a a a 9 


» 1 
\| Ki] =) x"dx = 3(b* —a’), 


a 


or 


and the Neumann series converges if |4|(b> — a?) < 3. Assuming that this 
condition holds, we have 


00 b 
u(x) =x + ye f Ki(x,t)tdt. 
j=l “4 


The special form of the kernel allows us to calculate K J(x,t) directly: 


b b b 
Kans f i, ef K (x, 51) K (81, 82) --- K (8j-1, t) dsids2--+ dsj-1 
a a a 


b pb b 
=f / aI xS793°++85_jtdsyds+++ dsj-1 
a a a 
b j-l 
=x( Pas) = xt\|K||J7!. 
a 


It follows that ee Ki (x, t)t dt = x||K\||/~! 3b? — a?) = x||K||/. Substituting 
this in the expression for u(x) yields 


Cc [oe 
u(x) =x+ > Ax [KI =x + xd Kl] $0 APT YK 
j=l nial 
x 3x 
=x(1+A||Kl = = =a 
1—A\KI|)— 1—AllK]| 3-23 — a3) 


Because of the simplicity of the kernel, we can solve the integral equation 
exactly. First we write 


b b 
uoyaatn f uu(dr=x+ ix f tu(t)dt=x(1+AA), (18.16) 


where A = fc tu(t) dt. Multiplying both sides by x and integrating, we ob- 
tain 


b b 
Aa | su(x)dx = (142A) f x2dx = (1-+AA)|IK| 


__ Iki 
1— AIK 
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Substituting A in Eq. (18.16) gives 


woy=a(14. IKI )- . : 
1 — A||K]| 1— A\|KI| 


This solution is the same as the first one we obtained. However, no series was 
involved here, and therefore no assumption is necessary concerning |A|||K]|. 


If one can calculate the eigenvectors |uv,) and the eigenvalues A; y 
then one can obtain a solution for the integral equation in terms of these 
eigenfunctions as follows: Substitute (18.15) in the Fredholm equation 
[Eq. (18.3)] to get 


1w) 4a cas! (up|) |Un). (18.17) 


n=1 


Multiply both sides by (uw |: 


(Um|U) = (Um|v) yeadcacl (Un|u) {tm |utn) 


n=1 


= 
= (Um|v) + AA;,| (Um|u), (18.18) 
or, if A is not one of the eigenvalues, 
Xr Am (Um|v) 
(: = ~ uml = (umlv) = (nla) = SE. 
Substituting this in Eq. (18.17) gives 
> (univ) 
= i ” 18.1 
v) + yo yl) (18.19) 
n=1 
and in the functional form, 
un), AAA, Vn. (18.20) 


pare 


In case A = A,, for some m, the Fredholm alternative (Theorem 18.2.2) 
says that we will have a solution only if |v) is in the orthogonal complement 
of the null space of > 1— AK. Moreover, Eq. (18.18) shows that (uj,,|u), the 
expansion coefficients of the basis vectors of the eigenspace ,,, cannot be 
specified. However, Eq. (18.18) does determine the rest of the coefficients 
as before. In this case, the solution can be written as 


r ee) 
Un|V 
Iw) =) + Dal@)+a Dy an, 8.2 
k=1 n 


5Remember that K is hermitian; therefore, d,, is real. 


18.2 Fredholm Integral Equations 


where r is the (finite) dimension of )V,,, k labels the orthonormal basis 
fu) of Min, and {cr}eey are arbitrary constants. In functional form, this 
equation becomes 


M1 MOO, ye ee ant). (18.22) 
k=1 n=1 


nAzm 


Example 18.2.4 We now give an example of the application of Eq. (18.20). 
We want to solve u(x) = 3 ie, K (x, t)u(t) dt + x? where 


Res y Ux (x)ug(t) nos 2k +1 


aR 7 Pk), 


and Px (x) is a Legendre polynomial. 
We first note that {ux} is an orthonormal set of functions, that K (x, t) is 
real and symmetric (therefore, hermitian), and that 


1 1 
fel dx|K(x,t)|” 
=1 -1 


U(x )UK(t) U(x )uy(t) 
=fiaf a y Dk /2 21/2 


k,l=0 
= zi saa | wecomtyds [ uonnar 
k,l=0 Se 
=dKI =6xI 
=) pik =) ag =2 <0 
k=0 k=0 


Thus, K (x, t) is a Hilbert-Schmidt kernel. 
Now note that 


[ ke. tug (t) dt = [oes e(t)dt 


1 
7 y wo fui tun ®) dt = rpun(x). 
1=0 


i ene 
=okl 


This shows that ux is an eigenfunction of K (x,t) with eigenvalue 1/2*/?. 
Since 3 4 1/2*/? for any integer k, we can use Eq. (18.20) to write 


u(x) =x? + >» ‘a Ss S ux (Xx). 
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But ie ug(s)s2ds = 0 for k > 3. For k < 2, we use the first three Legendre 
polynomials to get 


1 2 1 
: ug(s)s-ds = at! / ui (s)s-ds =0, 
= =] 


1 
2 u2(s)s*ds = ale 


This gives u(x) = 5 — 2x?. The reader is urged to substitute this solution in 
the original integral equation and verify that it works. 


18.2.2 Degenerate Kernels 


The preceding example involves the simplest kind of degenerate, or separa- 
ble, kernels. A kernel is called degenerate, or separable, if it can be written 
as a finite sum of products of functions of one variable: 


Kix. N=) gj @y7O, (18.23) 


j=l 


where @; and w; are assumed to be square-integrable. Substituting (18.23) 
in the Fredholm integral equation of the second kind, we obtain 


i b 
u(x) — AD) bj) i V7 u(D) dt = v(x). 
j=l “ 
If we define uj; = ee Wi (t)u(t) dt, the preceding equation becomes 


u(x) —A)- wjbj(x) = v(x). (18.24) 


j=l 
Multiply this equation by w;*(x) and integrate over x to get 
n 


jie py =v fori =1,2,..5,7, (18.25) 
j=l 


where Ajj =? WX(b; (dt and v; = [? W*@ov(t) dt. With pj, vi, and 
Aj; as components of column vectors u, v, and a matrix A, we can write the 
above linear system of equations as 


u—AAu=v, or (1—AA)U=v. (18.26) 


We can now determine the jz; by solving the system of linear equations 
given by (18.25). Once the jz; are determined, Eq. (18.24) gives u(x). Thus, 
for a degenerate kernel the Fredholm problem reduces to a system of linear 
equations. 


18.2 Fredholm Integral Equations 


Example 18.2.5 As a concrete example of an integral equation with de- 
generate kernel, we solve u(x) — A VP (1+ xt)u(t) dt = x for two differ- 
ent values of 2. The kernel, K (x,t) = 1+ xt, is separable with ¢;(x) = 1, 
wi(t) = 1, d2(x) = x, and W(t) = ¢t. This gives the matrix 


(3) 


WI NI- 


For convenience, we define the matrix B= 1 — AA. 


(a) 


(b) 


First assume that A = 1. In that case B has a nonzero determinant. 
Thus, Bo! exists, and can be calculated to be 


8 
pi.(~3 ~*). 
3 6 


1 1 
1 

n= vicnunrdr= f tdt=— and 
0 0 2 


With 


1 1 
n= f weuodr= f Pae. 
0 0 3 


(") _pty_ (3 ~2\(2\_ ~ . 
2 —2 0} \} = 
Equation (18.24) then gives u(x) = 1¢1(x) + U2¢2(x) + x = —2. 


Now, for the purpose of illustrating the other alternative of Theorem 
18.2.2, let us take AX = 8+ 213. Then 


T+ In/13 ae 
44/13 (5+2/13)/3)’ 


we obtain 


p=1-aa=—( 


and det B = 0. This shows that 8 + 2/13 is a characteristic value of 
the equation. We thus have a solution only if v(x) = x is orthogonal 
to the null space of 1 — A*A' = B’. To determine a basis for this null 
space, we have to find vectors |z) such that B‘|z) = 0. Since A is real, 
and B is real and symmetric, Bi= B, and we must solve 


Gee 4+ /13 )(E)=0 
44+J13 (5+2V/13)/3) \&) 
The solution to this equation is a multiple of |z) = ee Teh If the 


integral equation is to have a solution, the column vector v (whose 
corresponding ket we denote by |v)) must be orthogonal to |z). But 


) zo 


Therefore, the integral equation has no solution. 


(z|v) =(3. -2- ra 


Wie NI 


557 


558 


18 Integral Equations 


The reader may feel uneasy that the functions ¢; (x) and yy; (¢) appearing 
in a degenerate kernel are arbitrary to within a multiplicative function. After 
all, we can multiply ¢;(x) by a nonzero function, and divide yj; (t) by the 
same function, and get the same kernel. Such a change clearly alters the 
matrices A and B and therefore seems likely to change the solution, u(x). 
That this is not the case is demonstrated in Problem 18.2. In fact, it can 
be shown quite generally that the transformations described above do not 
change the solution. 

As the alert reader may have noticed, we have been avoiding the prob- 
lem of solving the eigenvalue (characteristic) problem for integral operators. 
Such a problem is nontrivial, and the analogue of the finite-dimensional 
case, where one works with determinants and characteristic polynomi- 
als, does not exist. An exception is a degenerate hermitian® kernel, i.e., 
a kernel of the form K (x,t) = aa hj(x)hF(t). Substituting this in the 
characteristic-value equation 


b 
cx) =a f K(x,t)u(t)dt, 2»~40, 


we obtain u(x) =A oF, hi (x) ris h*(t)u(t) dt. Defining yj; = ih h*(t) x 
u(t) dt and substituting it back in the equation gives 


u(x) =A hi (x). (18.27) 


i=l 
Multiplying this equation by athe (x) and integrating over x yields 


n 


b n 
AT k= >| / Wore dx] = mai ai. 


i=l i=l 


This is an eigenvalue equation for the hermitian n x n matrix M with el- 
ements mj;;, which, by spectral theorem for hermitian operators, can be 
solved. In fact, the matrix need not be hermitian; as long as it is normal, 
the eigenvalue problem can be solved. Once the eigenvectors and the eigen- 
values are found, we can substitute them in Eq. (18.27) and obtain u(x). We 
expect to find a finite number of eigenfunctions and eigenvalues. Our anal- 
ysis of compact operators included such a case. That analysis also showed 
that the entire (infinite-dimensional) Hilbert space could be written as the di- 
rect sum of eigenspaces that are finite-dimensional for nonzero eigenvalues. 
Therefore, we expect the eigenspace corresponding to the zero eigenvalue 
(or infinite characteristic value) to be infinite-dimensional. The following 
example illustrates these points. 


Example 18.2.6 Let us find the nonzero characteristic values and corre- 
sponding eigenfunctions of the kernel K (x,t) = 1+ sin(a +t) for —7 < 
xX,t<7. 


© Actually, the problem of a degenerate kernel that leads to a normal matrix, as described 
below, can also be solved. 


18.2 Fredholm Integral Equations 


We are seeking functions u and scalars 4 satisfying u(x) = AK [u](x), or 


u(x) = 70 [1 + sin(x + t) Ju(t) dt. 


=I. 


Expanding sin(x + ft), we obtain 


us 
wor) = 3 f [1 +sinx cost + cosx sint]u(t) dt, (18.28) 
= 
or 
7 u(x) = wy + wo sinx + 13.c08x, (18.29) 
where p41 = f"_ u(t) dt, uw. =f”, u(t) costdt, and 3 = f™_ u(t) sint dt. 


Integrate both sides of Eq. (18.29) with respect to x from —z to z to obtain 

07!) = 271. Similarly, multiplying by sinx and cosx and integrating 

yields 

Aol = 73 and 7! 3 = 72. (18.30) 

If w1 40, we get 47! = 27, which, when substituted in (18.30), yields 42 = 
1 

[43 = 0. We thus have, as a first solution, iar = 27 and |u,) = a(0o), where 
0 

a is an arbitrary constant. Equation (18.29) now gives api ui(x) = (1, OF 

u(x) =c1, where c, is an arbitrary constant to be determined. 


On the other hand, 2; = 0 if A~! 4 27. Then Eq. (18.30) yields A~! = 
+c and py = +13. For’! = ry = 7, Eq. (18.29) gives 


u(x) =u4(x) =c4(sinx + cosx), 


andfora7! =az! = —m, it yields u(x) = u_(x) = c_(sinx —cosx), where 
c+ are arbitrary constants to be determined by normalization of eigenfunc- 
tions. The normalized eigenfunctions are 


= Jin’ u(x) = setsins + cos x). 
Direct substitution in the original integral equation easily verifies that w1, 
u+, and u_ are eigenfunctions of the integral equation with the eigenvalues 
calculated above. 

Let us now consider the zero eigenvalue (or infinite characteristic value). 
Divide both sides of Eq. (18.28) by A and take the limit of 1 — oo. Then the 
integral equation becomes 


8 
if [1 +sinx cost + cosx sint]u(t) dt =0. 
—iT 

The solutions u(t) to this equation would span the eigenspace corresponding 
to the zero eigenvalue, or infinite characteristic value. We pointed out above 
that this eigenspace is expected to be infinite-dimensional. This expectation 
is borne out once we note that all functions of the form sinnt or cosnt 
with n > 2 make the above integral zero; and there are infinitely many such 
functions. 
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18.3 Problems 
18.1 Use mathematical induction to derive Eq. (18.5). 


18.2 Repeat part (a) of Example 18.2.5 using 


1 
AG) =>, Wilt) =2, 


o2(x) =x, yo(t) =t 
so that we still have K (x, t) = $1 (x) Wi (t) + doa) Yro(t). 


18.3 Use the spectral theorem for compact hermitian operators to show that 
if the kernel of a Hilbert-Schmidt operator has a finite number of nonzero 
eigenvalues, then the kernel is separable. Hint: See the discussion at the 
beginning of Sect. 18.2.1. 


18.4 Use the method of successive approximations to solve the Volterra 
equation u(x) =A ie u(t)dt. Then derive a DE equivalent to the Volterra 
equation (make sure to include the initial condition), and solve it. 


18.5 Regard the Fourier transform, 


1 PO cites 
FIIs) = = el* f(y) dy 


as an integral operator. 


(a) Show that F?[ f](x) = f(—x). 

(b) Deduce, therefore, that the only eigenvalues of this operator are 1 = 

1, +i. 

(c) Let f(x) be any even function of x. Show that an appropriate choice 
of a can make u = f + aF[f] an eigenfunction of F. (This shows that 
the eigenvalues of F have infinite multiplicity.) 


18.6 For what values of 4 does the following integral equation have a solu- 
tion? 


u(x) = af sin(x + t)u(t)dt +x. 
0 


What is that solution? Redo the problem using a Neumann series expansion. 
Under what condition is the series convergent? 


18.7 It is possible to multiply the functions ¢;(x) by y;(x) and y;(t) by 
1/y;(t) and still get the same degenerate kernel, K(x, t) = Vi) (x)wj(t). 
Show that such arbitrariness, although affecting the matrices A and B, does 
not change the solution of the Fredholm problem 


b 
u(x) — 2 f K(x, thu(@t)dt = f(x). 


18.3. Problems 


18.8 Show, by direct substitution, that the solution found in Example 18.2.4 
does satisfy its integral equation. 


18.9 Solve u(x) =4 f+ Nut dt +x. 


18.10 Solve u(x) =d i, xtu(t) dt + x using the Neumann series method. 
For what values of A is the series convergent? Now find the eigenvalues and 
eigenfunctions of the kernel and solve the problem using these eigenvalues 
and eigenfunctions. 


18.11 Solve u(x) =A i K (x, t)u(t)dt + x%, where a is any real number 
except a negative integer, and K (x,t) = e~°+”. For what values of 4 does 
the integral equation have a solution? 


18.12 Solve the integral equations 


1 


(a) u(x) =e* +1 f xtu(t) dt, (b) u(x) = if sincs — t)u(t) dt, 
0 0) 


1 x 
(c) unas f xtu(t) dt, (d) ucyax+ f u(t) dt. 
0 0 


18.13 Solve the integral equation u(x) =x +A i, (x + t)tu(t) dt, keeping 
terms up to A7. 


18.14 Solve the integral equation u(x) = e~!*! + ve hee ely (t) dt, as- 
suming that f remains finite as x — -too. 


18.15 Solve the integral equation u(x) = e7?! + ASO” u(t) cos xt dt, as- 
suming that f remains finite as x — -too. 


561 


Sturm-Liouville Systems 1 ©) 


The linear operators discussed in the last two chapters were exclusively in- 
tegral operators. Most applications of physical interest, however, involve 
differential operators (DO). Unfortunately, differential operators are un- 
bounded. We noted that complications arise when one abandons the com- 
pactness property of the operator, e.g., sums turn into integrals and one loses 
one’s grip over the eigenvalues of noncompact operators. The transition to 
unbounded operators complicates matters even more. Fortunately, the for- 
malism of one type of DOs that occur most frequently in physics can be 
studied in the context of compact operators. Such a study is our aim for this 
chapter. 


19.1 Compact-Resolvent Unbounded Operators 


As was pointed out in Example 17.2.7, the derivative operator cannot be 
defined for all functions in LP ta, b). This motivates the following: 


Definition 19.1.1 Let D bea linear manifold! in the Hilbert space J. A lin- 
ear map T: D — H will be called a linear operator in? J{. D is called the 
domain of T and often denoted by D(T). 


Example 19.1.2 The domain of the derivative operator D, as an operator 
on L(a, b), cannot be the entire space. On the other hand, D is defined on 
the linear manifold M in £7(a, b) spanned by {e!?”*/"} with L = b —a. 
As we saw in Chap. 9, M is dense (see Definition 17.4.5 and the discussion 
following it) in L*(a,b). This is the essence of Fourier series: That every 
function in £*(a, b) can be expanded in (i.e., approximated by) a Fourier 
series. It turns out that many unbounded operators on a Hilbert space share 
the same property, namely that their domains are dense in the Hilbert space. 


'A linear manifold of an infinite-dimensional normed vector space V is a proper subset 
that is a vector space in its own right, but is not necessarily closed. 


? As opposed to on H. 


S. Hassani, Mathematical Physics, DOI 10.1007/978-3-319-01195-0_19, 563 
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Another important property of Fourier expansion is the fact that if the 
function is differentiable, then one can differentiate both sides, i.e., one can 
differentiate a Fourier expansion term by term if such an operation makes 
sense for the original function. Define the sequence { f;,} by 


m 


i 1 b s 
Fn(*) = > anei2mnx/L an = = | f (xe i2nnx/L gy. 


na=—-m 


Then we can state the property above as follows: Suppose {fi} is in M. 
If lim fn = f and lim f/, = g, then f’ = g and f € M. Many unbounded 
operators share this property. 


Definition 19.1.3 Let D be a linear manifold in the Hilbert space 1. Let 
T: D— HX bea linear operator in 3. Suppose that for any sequence {|u,,)} 
in D, both {|u,)} and {T|u,)} converge in H, ie., 


lim|un) =|u) and limT|u,) =|v). 


We say that T is closed if |v) € D and T|u) = |v). 


Notice that we cannot demand that |v) be in D for a general operator. 
This, as we saw in the preceding example, will not be appropriate for un- 
bounded operators. 

The restriction of the domain of an unbounded operator is necessitated 
by the fact that the action of the operator on a vector in the Hilbert space 
in general takes that vector out of the space. The following theorem (see 
[DeVi 90, pp. 251-252] for a proof) shows why this is necessary: 


Theorem 19.1.4 A closed linear operator in K that is defined at every point 
of H (so that D = KH) is bounded. 


Thus, if we are interested in unbounded operators (for instance, differen- 
tial operators), we have to restrict their domains. In particular, we have to 
accept the possibility of an operator whose adjoint has a different domain.* 


Definition 19.1.5 Let T be a linear operator in }{. We shall say that T is 
hermitian if T’ is an extension of T, ic. D(T) C D(T") and T*|u) = T\u) 
for all |u) € D(T). T is called self-adjoint if D(T) = DT"). 


As we Shall see shortly, certain types of Sturm-Liouville operators, al- 
though unbounded, lend themselves to a study within the context of compact 
operators. 


Definition 19.1.6 A hermitian linear operator T in a Hilbert space 1 is said 
to have a compact resolvent if there is a 4. € e(T) for which the resolvent 
R,,(T) is compact. 


3This subtle difference between hermitian and self-adjoint is stated here merely to warn 
the reader and will be confined to the present discussion. The two qualifiers will be 
(ab)used interchangeably in the rest of the book. 


19.1 Compact-Resolvent Unbounded Operators 


An immediate consequence of this definition is that R,(T) is compact for 
all A € p(T). To see this, note that R,(T) is bounded by Definition 17.3.1. 
Now use Eq. (17.7) and write 


R(T) = [1+ (— w)RA(T) Ri (1). 


The RHS is a product of a bounded* and a compact operator, and there- 
fore must be compact. The compactness of the resolvent characterizes its 
spectrum by Theorem 17.6.8. As the following theorem shows, this in turn 
characterizes the spectrum of the operators with compact resolvent. 


Theorem 19.1.7 Let T be an operator with compact resolvent Rj (1) where 
A € p(T). ThenO0F¢ w € p(R,(T)) ifand only if (A+ 1/p) € p(T). Similarly, 
LL #0 is an eigenvalue of Rj (T) if and only if (A + 1/) is an eigenvalue 
of T. Furthermore, the eigenvectors of R,(T) corresponding to tt coincide 
with those of T corresponding to (A + 1/2). 


Proof The proof consists of a series of two-sided implications involving 
definitions. We give the proof of the first part, the second part being very 
similar: 


Le p(R, (T)) iff Ry (T) — 1 is invertible. 
Ry (T) — 1 is invertible iff (T—A1)7! — 141 is invertible. 


(T-21y7-* = 41 is invertible iff 1—uw(T—A1) is invertible. 


1 
1— wT — 41) is invertible iff —1—T+1 is invertible. 
LL 


1 1 
(< + ia —T is invertible iff (- + i) € e(T). 
LL LL 


Comparing the LHS of the first line with the RHS of the last line, we obtain 
the first part of the theorem. 


A consequence of this theorem and Theorem 17.5.11 is that the eigen- 
spaces of an (unbounded) operator with compact resolvent are finite- 
dimensional, i.e., such an operator has only finitely many eigenvectors cor- 
responding to each of its eigenvalues. Moreover, arranging the eigenvalues 
Ln Of the resolvent in decreasing order (as done in Theorem 17.6.8), we con- 
clude that the eigenvalues of T can be arranged in a sequence in increasing 
order of their absolute values and the limit of this sequence is infinity. 


Example 19.1.8 Consider the operator T in £7(0, 1) defined by? Tf = 
— f” having the domain 


4The sum of two bounded operators is bounded. 


5We shall depart from our convention here and shall not use the Dirac bar-ket notation 
although the use of abstract operators encourages their use. The reason is that in this 
example, we are dealing with functions, and it is more convenient to undress the functions 
from their Dirac clothing. 
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DM ={fe L720, fe L720, ), f= fd) =O}. 


The reader may check that zero is not an eigenvalue of T. Therefore, we may 
choose Ro(T) = T~!. We shall study a systematic way of finding inverses of 
some specific differential operators in the upcoming chapters on Green’s 
functions. At this point, suffice it to say that T~! can be written as a Hilbert- 
Schmidt integral operator with kernel 


x(l—t) if0<x<r<l, 
K(x, th= 
(l—x)t if0<1t<x<1. 
Thus, if Tf = g,ie., if f” =—g, then T~!g = f, or f = K[g], ie. 
1 
f(s) =Kigies)= f K(x, t)g(t) dt 


x 1 
=i G-srenar+ | (1 —x)tg(t) dt. 
0 x 


It is readily verified that K[g](0) = K[g](1) =0 and f”"(x) = K[g]"(x) = 
=p, 

We can now use Theorem 19.1.7 with 4 = 0 to find all the eigenval- 
ues of T: 4, is an eigenvalue of T if and only if 1/j, is an eigenvalue of 
T—!. These eigenvalues should have finite-dimensional eigenspaces, and we 
should be able to arrange them in increasing order of magnitude without 
bound. To verify this, we solve f” = —jzf, whose solutions are 4, = nn 
and f,(x) = sinnzx. Note that there is only one eigenfunction correspond- 
ing to each eigenvalue. Therefore, the eigenspaces are finite- (one-) dimen- 
sional. 


The example above is a special case of a large class of DOs occurring in 
mathematical physics. Recall from Theorem 14.5.4 that all linear second- 
order differential equations can be made self-adjoint. Moreover, Proposi- 
tion 14.4.11 showed that any SOLDE can be transformed into a form in 
which the first-derivative term is absent. By dividing the DE by the coeffi- 
cient of the second-derivative term if necessary, the study of the most general 
second-order linear differential operators boils down to that of the so-called 
Sturm-Liouville (S-L) operators 


a 
dy2 


L, = 
* dx 


— q(x), (19.1) 
which are assumed to be self-adjoint. Differential operators are necessar- 
ily accompanied by boundary conditions that specify their domains. So, to 
be complete, let us assume that the DO in Eq. (19.1) acts on the subset of 
£7 (a, b) consisting of functions u that satisfy the following so-called sepa- 
rated boundary conditions: 


aju(a) + Bu’ (a) =0, 


(19.2) 
a2u(b) + Bou'(b) = 0, 


19.1 Compact-Resolvent Unbounded Operators 


where 1, @2, B,, and 2 are real constants with the property that the matrix 
of coefficients has no zero rows. The collection of the DO and the boundary 
conditions above is called a regular Sturm-Liouville system. 

We now show that the DO of a regular Sturm-Liouville system has com- 
pact resolvent. First observe that by adding wu—with q@ an arbitrary number 
different from all eigenvalues of the DO—to both sides of the eigenvalue 
equation u” — qu = Au, we can assume?® that zero is not an eigenvalue of 
L,.. Next, suppose that w(x) and u2(x) are the two linearly independent so- 
lutions of the homogeneous DE satisfying the first and the second boundary 
conditions of Eq. (19.2), respectively. The operator whose kernel is 


—uy(x)u2(t)/Wla) ifa<x<t<b, 


ar eee ifastsxsb, 


in which W is the Wronskian of the solutions, is a Hilbert-Schmidt operator 
and therefore compact. We now show that K (x, f) is the resolvent Ro(Ly) = 
Ly! = K of our DO. To see this, write L,u = v, and 


=eness 2 7) ania “pow 
Wa) Ja Wa) Jx 
Differentiating this once gives 
i ee P wey” 
u(x)= we J, u,(t)v(t) dt wo J, u2(t)v(t) dt, 
and a second differentiation yields 
” x ” b 
u(x) = _ [ ui(o(o at — we f u(Nu(dt + v@). 


The last equation follows from the fact that the Wronskian u\u2 _ uu 1 1S 
constant for a DE of the form wu” — qu = 0. By substituting w/ = qu; and 
uy, = quz in the last equation, we verify that u = K[v] is indeed a solution 
of the Sturm-Liouville system Ly u = v. 

Next, we show that the eigensolutions of the S-L system are nondegen- 
erate, i.e., the eigenspaces are one-dimensional. Suppose f; and f2 are any 
two eigenfunctions corresponding to the same eigenvalue. Then both must 
satisfy the same DE and the same boundary conditions; in particular, we 
must have 


ait Bifli=0 , (Fi) Ho) (a) _ (0) 
ahiatpfa=0 ~ \Aa@ g@)\e)~\o) O° 


If aw; and 6 are not both zero, the Wronskian—the determinant of the 
matrix above—must vanish. Therefore, the two functions must be linearly 
dependent. Finally, recall that a Hilbert space on which a compact opera- 
tor K is defined can be written as a direct sum of the latter’s eigenspaces. 


6 Although this will change g—and the original operator—no information will be lost 
because the eigenvectors will be the same and all eigenvalues will be changed by a. 
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More specifically, FH = = @M;, where each M; is finite-dimensional 
for 7 = 1,2,..., and N can be finite or infinite. If N is finite, then Mo, 
which can be considered as the eigenspace of zero eigenvalue,’ will be 
infinite-dimensional. If Mo is finite-dimensional (or absent), then N must 
be infinite, and the eigenvectors of K will span the entire space, i.e., they 
will form a complete orthogonal system. We now show that this holds for 
the regular Sturm-Liouville operator. 


Historical Notes 

Jacques Charles Francois Sturm (1803-1855) made the first accurate determination of 
the velocity of sound in water in 1826, working with the Swiss engineer Daniel Colladon. 
He became a French citizen in 1833 and worked in Paris at the Ecole Polytechnique where 
he became a professor in 1838. In 1840 he succeeded Poisson in the chair of mechanics 
in the Faculté des Sciences, Paris. 

The problems of determining the eigenvalues and eigenfunctions of an ordinary differen- 
tial equation with boundary conditions and of expanding a given function in terms of an 
infinite series of the eigenfunctions, which date from about 1750, became more promi- 
nent as new coordinate systems were introduced and new classes of functions arose as the 
eigenfunctions of ordinary differential equations. Sturm and his friend Joseph Liouville 
decided to tackle the general problem for any second-order linear differential equation. 
Sturm had been working since 1833 on problems of partial differential equations, pri- 
marily on the flow of heat in a bar of variable density, and hence was fully aware of the 
eigenvalue and eigenfunction problem. The mathematical ideas he applied to this prob- 
lem are closely related to his investigations of the reality and distribution of the roots of 
algebraic equations. His ideas on differential equations, he says, came from the study of 
difference equations and a passage to the limit. Liouville, informed by Sturm of the prob- 
lems he was working on, took up the same subject. The results of their joint work was 
published in several papers which are quite detailed. 


Suppose that the above Hilbert-Schmidt operator K has a zero eigenvalue. 
Then, there must exists a nonzero function v such that K[v](x) = 0, 1e., 


ur(x) [* u(x) [? 


We : u,(t)v(t) dt — Wa@ J, 


uz(t)u(t) dt =0 (19.4) 
for all x. Differentiate this twice to get 


U(X) 7 _ uy@) a 
wa J, u,(t)v(t) dt w@ Se 


u2(t)u(t) dt + v(x) = 0. 


Now substitute w/ = gu, and uw = quz in this equation and use Eq. (19.4) to 
conclude that v = 0. This is impossible because no eigenvector can be zero. 
Hence, zero is not an eigenvalue of K, i.e., Mo = {0}. Since eigenvectors 
of K= LS coincide with eigenvectors of L,, and eigenvalues of Ly are the 
reciprocals of the eigenvalues of K, we have the following result. 


Theorem 19.1.9 A regular Sturm-Liouville system has a countable 
number of eigenvalues that can be arranged in an increasing se- 
quence that has infinity as its limit. The eigenvectors of the Sturm- 


7The reader recalls that when K acts on Mp, it yields zero. 


19.2  Sturm-Liouville Systems and SOLDEs 


Liouville operator are nondegenerate and constitute a complete or- 
thogonal set. Furthermore, the eigenfunction un(x) corresponding to 
the eigenvalue i» has exactly n zeros in its interval of definition. 


The last statement is not a result of operator theory, but can be derived 
using the theory of differential equations. We shall not present the details 
of its derivation. We need to emphasize that the boundary conditions are an 
integral part of S-L systems. Changing the boundary conditions so that, for 
example, they are no longer separated may destroy the regularity of the S-L 
system. 


19.2 Sturm-Liouville Systems and SOLDEs 


We are now ready to combine our discussion of the preceding section with 
the knowledge gained from our study of differential equations. We saw in 
Chap. 13 that the separation of PDEs normally results in expressions of the 
form 
du du 
L[uJ+iAu=0, or p2(x)—5 + pi(x)— + po(x)u+aAu=0, (19.5) 
dx dx 
where u is a function of a single variable and A is, a priori, an arbitrary 


constant. This is an eigenvalue equation for the operator L, which is not, in 
general, self-adjoint. If we use Theorem 14.5.4 and multiply (19.5) by 


* pi(t) 
exo] [ ae ar| 


it becomes self-adjoint for real A, and can be written as 


1 
aes p2(x) 


d d 
[owe | + [Aw(x) — q(x) |u=0 (19.6) 


with p(x) = w(x) p2(x) and g(x) = —po(x) w(x). Equation (19.6) is the 
standard form of the S-L equation. However, it is not in the form studied in 
the previous section. To turn it into that form one changes both the indepen- 
dent and dependent variables via the so-called Liouville substitution: 


u(x) =v()[p@ww] 4, t= i, . Ee ds. (19.7) 


It is then a matter of chain-rule differentiation to show that Eq. (19.6) be- 
comes 


oh + [A - Q(t)]v=0, (19.8) 


where 


{ja d? 
(1) = + [p(e)w(e@) [wy]. 


w(x(t)) 
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Therefore, Theorem 19.1.9 still holds. 


Historical Notes 

Joseph Liouville (1809-1882) was a highly respected professor at the Collége de France, 
in Paris, and the founder and editor of the Journal des Mathématiques Pures et Ap- 
pliquées, a famous periodical that played an important role in French mathematical life 
through the latter part of the nineteenth century. His own remarkable achievements as a 
creative mathematician have only recently received the appreciation they deserve. 

He was the first to solve a boundary value problem by solving an equivalent integral 
equation. His ingenious theory of fractional differentiation answered the long-standing 
question of what reasonable meaning can be assigned to the symbol d” y/dx” when n 
is not a positive integer. He discovered the fundamental result in complex analysis that 
a bounded entire function is necessarily a constant and used it as the basis for his own 
theory of elliptic functions. There is also a well-known Liouville theorem in Hamilto- 
nian mechanics, which states that volume integrals are time-invariant in phase space. In 
collaboration with Sturm, he also investigated the eigenvalue problem of second-order 
differential equations. 

The theory of transcendental numbers is another branch of mathematics that originated 
in Liouville’s work. The irrationality of z and e (the fact that they are not solutions of 
any linear equations) had been proved in the eighteenth century by Lambert and Euler. 
In 1844 Liouville showed that e is not a root of any quadratic equation with integral 
coefficients as well. This led him to conjecture that e is transcendental, which means that 
it does not satisfy any polynomial equation with integral coefficients. 


Example 19.2.1 The Liouville substitution [Eq. (19.7)] transforms the 
Bessel DE (xu’)’ + (k?x — v*/x)u =0 into 


dv v>—1/4 
ke = 
PP) ee 2 if 0, 


from which we can obtain an interesting result when v = 5: In that case we 
have is + k*v = 0, whose solutions are of the form cos kt and sinkt. Noting 
that u(x) = Jj /2(x), Eg. (19.7) gives 


sin kt cos kt 


or Jij2(kt)=B ; 
a 1/2(kt) Sik 


and since Jj /2(x) is analytic at x = 0, we must have Jj /2(kt) = A sin kt/./t, 
which is the result obtained in Chap. 15. 


Jij2(kt) =A 


The appearance of w is the result of our desire to render the differential 
operator self-adjoint. It also appears in another context. Recall the Lagrange 
identity for a self-adjoint differential operator L: 


d 
uL[v] — vL[u] = 


7x Po) Lev’) — v(x)u'(x)]}. (19.9) 


If we specialize this identity to the S-L equation of (19.6) with u = wu cor- 
responding to the eigenvalue 4, and v = uw corresponding to the eigenvalue 


42, we obtain for the LHS 


uyL[u2] — ugb[uy] = uy (—Agqwuz) + u2(Ajwuy) = (Ay — Az) wuyu. 


19.2 Sturm-Liouville Systems and SOLDEs 


Integrating both sides of (19.9) then yields 


b 
Qu =a) [ wuyurdx = { p(x)[ur(x)us(x) — wa(x)ui (x)]}?. (19.10) 


A desired property of the solutions of a self-adjoint DE is their orthogonal- 
ity when they belong to different eigenvalues. This property will be satisfied 
if we assume an inner product integral with weight function w(x), and if 
the RHS of Eq. (19.10) vanishes. There are various boundary conditions 
that fulfill the latter requirement. For example, uw; and w2 could satisfy the 
boundary conditions of Eq. (19.2). Another set of appropriate boundary con- 
ditions (BC) is the periodic BC given by 


u(a)=u(b) and u'(a)=u'(b). (19.11) 


However, as the following example shows, the latter BCs do not lead to a 
regular S-L system. 


Example 19.2.2 The following examples show how BC can change the S-L 
systems. 


(a) The S-L system consisting of the S-L equation d?u/dt? + wu =0 
in the interval [0, 7] with the separated BCs u(0) = 0 and u(T) = 
0 has the eigenfunctions u,(t) = sin et with n = 1,2,... and the 
eigenvalues A, = a = (nm / T)* withn =1,2,.... 

(b) Let the S-L equation be the same as in part (a) but change the interval 
to [—T,+T7] and the BCs to a periodic one such as u(—T) = u(T) 
and u’/(—T) = u'(T). The eigenvalues are the same as before, but the 
eigenfunctions are 1, sin(nat/T), and cos(nzt/T), where n is a posi- 
tive integer. Note that there is a degeneracy here in the sense that there 
are two linearly independent eigenfunctions having the same eigen- 
value (nz /T)*. By Theorem 19.1.9, the S-L system is not regular. 

(c) The Bessel equation for a given fixed v7 is 


"” 1 / 2 ve 
uo+—-u+{k = u=0, wherea<x<b, 
x x 


and it can be turned into an S-L system if we multiply it by 


: ex | fap at| =ex Lf |= 
pox) ti pia | Ls |” 


Then we can write 


d [{ du v2 
ea eae hx——)u= 
£ (eS) +( . iu 


which is in the form of Eq. (19.6) with p=w=x,A= k?, and 
q(x) = v?/x. Ifa > 0, we can obtain a regular S-L system by applying 
appropriate separated BCs. 


w(x) = 


periodic boundary 
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A regular S-L system is too restrictive for applications where either a or 
b or both may be infinite or where either a or b may be a singular point of 
the S-L equation. A singular S-L system is one for which one or more of 
the following conditions hold: 


The interval [a, b] stretches to infinity in either or both directions. 
Either p or w vanishes at one or both end points a and b. 

The function g(x) is not continuous in [a, b]. 

Any one of the functions p(x), g(x), and w(x) is singular at a or b. 


cae ae 


Even though the conclusions concerning eigenvalues of a regular S-L 
system cannot be generalized to the singular S-L system, the orthogonality 
of eigenfunctions corresponding to different eigenvalues can, as long as the 
eigenfunctions are square-integrable with weight function w(x): 


Box 19.2.3 The eigenfunctions of a singular S-L system are orthogo- 
nal if the RHS of (19.10) vanishes. 


Example 19.2.4 Bessel functions J, (x) are entire functions. Thus, they are 
square-integrable in the interval [0, b] for any finite positive b. For fixed v 
the DE 


2 
r —, +r— + (kr? —v*)u=0 (19.12) 
: 


transforms into the Bessel equation x7u"” + xu! + (x? — v*)u = 0 if we 
make the substitution kr = x. Thus, the solution of the singular S-L equa- 
tion (19.12) that is analytic at r = 0 and corresponds to the eigenvalue k? is 
ux(r) = Jy (kr). For two different eigenvalues, ke and k5, the eigenfunctions 
are orthogonal if the boundary term of (19.10) corresponding to Eq. (19.12) 
vanishes, that is, if 


{r[Jv(kir) J, (kar) — Jv(kary Skin) |} 


vanishes, which will occur if and only if J, (k1b) J) (kab) — Jy (kab) J} (kb) = 
0. A common choice is to take J,(kj}b) = 0 = J,(k2b), that is, to take 
both k,b and kb as (different) roots of the Bessel function of order v. 
We thus have iy rJy(kir) Jv(kjr) dr = 0 if k; and k; are different roots of 
Jv (kb) = 0. 

The Legendre equation 


ae = 3%] +iu=0, where —-1 <x <1, 

dx dx 

is already self-adjoint. Thus, w(x) = 1, and p(x) = 1 — x”. The eigenfunc- 
tions of this singular S-L system [singular because p(1) = p(—1) = 0] are 
regular at the end points x = +1 and are the Legendre polynomials P,,(x) 
corresponding to A = n(n + 1). The boundary term of (19.10) clearly van- 
ishes ata = —1 and b = +1. Since P,, (x) are square-integrable on [—1, +1], 


19.3. Asymptotic Behavior 


we obtain the familiar orthogonality relation: a P(x) Pn (x) dx = 0 if 
mAén. 
The Hermite DE is 
u” —2xu'+rAu=0. (19.13) 


It is transformed into an S-L system if we multiply it by w(x) = e-*’. The 
resulting S-L equation is 


d 2 du 2 
— |e" — he *~ u=0. 19.14 
dx E =| <2 ( ) 
The boundary term corresponding to the two eigenfunctions u(x) and 
u2(x) having the respective eigenvalues A, and Az # Aq is 


fe [ui (x)u5 (x) —ur(x)uj (x)] V 


This vanishes for arbitrary uw; and uz (because they are Hermite polynomi- 
als) if a = —oo and b= +00. 

The function u is an eigenfunction of (19.14) corresponding to the eigen- 
value 2 if and only if it is a solution of (19.13). Solutions of this DE 
corresponding to A = 2n are the Hermite polynomials H,,(x) discussed in 
Chap. 8. We can therefore write fig er Ay (x) An (x) dx = 0 if m An. 
This orthogonality relation was also derived in Chap. 8. 


19.3. Asymptotic Behavior 


The S-L problem is central to the solution of many DEs in mathematical 
physics. In some cases the S-L equation has a direct bearing on the physics. 
For example, the eigenvalue 2 may correspond to the orbital angular mo- 
mentum of an electron in an atom (see the treatment of spherical harmonics 
in Chap. 13) or to the energy levels of a particle in a potential (see Exam- 
ple 15.5.1). In many cases, then, it is worthwhile to gain some knowledge 
of the behavior of an S-L system in the limit of large A—high angular mo- 
mentum or high energy. Similarly, it is useful to understand the behavior of 
the solutions for large values of their arguments. We therefore devote this 
section to a discussion of the behavior of solutions of an S-L system in the 
limit of large eigenvalues and large independent variable. 


19.3.1 Large Eigenvalues 


We assume that the S-L operator has the form given in Eq. (19.1). This can 
always be done for an arbitrary second-order linear DE by multiplying it by 
a proper function (to make it self-adjoint) followed by a Liouville substitu- 
tion. So, consider an S-L systems of the following form: 


u"+[rA—q(x)]u=u" + O(x)u=0 where Q=A—q (19.15) 
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with separated BCs of (19.2). Let us assume that Q(x) > 0 for all x € [a, b], 
that is, A > q(x). This is reasonable, since we are interested in very large A. 

The study of the system of (19.15) and (19.2) is simplified if we make 
the Priifer substitution: 


u=RQ~'4sing, u'=ROQ"*cos¢, (19.16) 


where R(x, A) and (x, A) are A-dependent functions of x. This substitu- 
tion transforms the S-L equation of (19.15) into a pair of equations (see 
Problem 19.3): 


d 
ua =ViA— q(x) 


7 aia @ sin2¢, 

x — q(x 

, : (19.17) 
= ea cos 2¢. 

dx 4[A—q()] 


The function R(x, A) is assumed to be positive because any negativity 
of u can be transferred to the phase $(x, 4). Also, R cannot be zero at any 
point of [a,b], because both u and u’ would vanish at that point, and, by 
Lemma 14.3.3, u(x) = 0. Equation (19.17) is very useful in discussing the 
asymptotic behavior of solutions of S-L systems both when A —> oo and 
when x — oo. Before we discuss such asymptotics, we need to make a 
digression. 

It is often useful to have a notation for the behavior of a function f (x, A) 
for large 1 and all values of x. If the function remains bounded for all val- 
ues of x as A > oo, we write f(x, 4) = O(1). Intuitively, this means that 
as A gets larger and larger, the magnitude of the function f(x, A) remains 
of order 1. In other words, for no value of x is limy—+oo f(x, A) infinite. 
If 4” f(x, A) = O(1), then we can write f(x,A) = O(1)/A”. This means 
that as A tends to infinity, f(x, A) goes to zero as fast as 1/4” does. Some- 
times this is written as f(x,4) = O(A~"). Some properties of O(1) are as 
follows: 


If a is a finite real number, then O(1) + a= O(I). 
O(1) + O(1) = O(1), and O(1)O(1) = O71). 
For finite a and b, [? O(1) dx = O(1). 

If r and s are real numbers with r < s, then 


Poe 


O(a" + 00)as = ODA‘. 


5. If g(x) is any bounded function of x, then a Taylor series expansion 
yields 


[A+ g(x)]" = al + oa 
ae g(x) rr-DPg@)] , OU) 
=a {rar +e ey + S| 


=M +re(x)a"! + O(a"? =a" + OCA! 
= O(a". 


19.3 Asymptotic Behavior 


Returning to Eq. (19.17) and expanding its RHSs using property 5, we 
obtain 


d od od O(1 dk od 

= oe (1) , (1) mee a ( : _ 9 ) 
Ji Xr Jd dx xr 

Taylor series expansion of @(x, A) and R(x, A) about x = a then yields 

O(1) 


a 


b(x,A) = o(a, A) + (x —a)VA + —= 
(19.18) 


R(x, A) = R(a, A) + “oe 


for 4. — oo. These results are useful in determining the behavior of i, for 
large n. As an example, we use (19.2) and (19.16) to write 


a ua) R(a,A)Q'/4(a, A) cos[p(a, A)] 


Bi ua) -R(a,A)Q-/4(a, A) sin[G(a, A)] 
Q'/*(a, A) cot[p(a, d)], 


where we have assumed that 6; 4 0. If 6; = 0, we can take the ratio Bj /a1, 
which is finite because at least one of the two constants must be different 
from zero. Let A = —a1/, and write cot[¢(a, A)] = A/./A — g(a). Simi- 
larly, cot[d (b, 4)] = B/./X — q(b), where B = —a7/B>. Let us concentrate 
on the nth eigenvalue and write 


=a! é 2 
i RO cere (7 a a mr 


For large 4, the argument of cot~! is small. Therefore, we can expand the 
RHS in a Taylor series about zero: 


4 1 ore xz OCI) 
t = cot O)-—e+---=—-—e+---=—4+ 
cot “e€=cot (0)—e 5 € 5 7 
for € = O(1)/V/An. It follows that 
x  O(\) 1 O(1) 
(a, An aa a a. O(b,An)= > tna + 2 ALO) 
V An " 2 J/An 


The term nz appears in (19.19) because, by Theorem 19.1.9, the nth eigen- 
function has n zeros between a and b. Since u = ROQ~"/* sing, this means 
that sin@ must go through n zeros as x goes from a to b. Thus, at x = b the 
phase @ must be nz larger than at x =a. 

Substituting x = b in the first equation of (19.18), with 4 > A,, and 
using (19.19), we obtain 


4 O() x O(1) O(1) 
or 
(b —a)J dn = + as (19.20) 


Nie 
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One consequence of this result is that, limy—oo NAy — (b — a)/x. Thus, 
J An = Cun, where limps 9 Cy = 1/(b — a), and Eq. (19.20) can be rewrit- 
ten as 
ni O(1) nw O(1) 

b-a Cyn b—a n- 
This equation describes the asymptotic behavior of eigenvalues. The fol- 
lowing theorem, stated without proof, describes the asymptotic behavior of 
eigenfunctions. 


An = 


(19.21) 


Theorem 19.3.1 Let {un(x)}°°9 be the normalized eigenfunctions of the 
regular S-L system given by Eqs. (19.15) and (19.2) with Bi B2 4 0. Then, 


asymptotic behavior of forn — oo, 


solutions of large order 


2: - oxGl 
Un(X) = 5 agg tt a o. 
—a b-a n 


Example 19.3.2 Let us derive an asymptotic formula for the Legendre 
polynomials P, (x). We first make the Liouville substitution to transform 
the Legendre DE [(1 — x7) P’]/ + n(n + 1)P, =0 into 


2 


au + [An — Q(t)]v=0, where A, =n(n + 1). (19.22) 


Here p(x) = 1 — x? and w(x) = 1, sot = f[*ds/V1—s? = cos"! x, or 
x(t) = cost, and 


= 1/4 _ v(t) 


P,(x(t)) = v1 - x7] (19.23) 


In Eq. (19.22) 


2 
ew =e Si - 27)" 


a meee (1 jee ) 
= —,[vVsint] = —- : 
Jsint dt? 4 sin’ t 
For large n we can neglect Q(t), make the approximation i, ¥ (m + 5) 
and write v + (n+ 5)?v = 0, whose general solution is 


v(t) = Acos| (1 + )e +a, 


where A and @ are arbitrary constants. Substituting this solution in (19.23) 
yields P, (cost) = Acos[(n + 5)t +a]/Vsint. To determine a we note that 
P,(O) = 0 if n is odd. Thus, if we let t = 2/2, the cosine term vanishes 
for odd n if a = —7r/4. Thus, the general asymptotic formula for Legendre 
polynomials is 


P, (cost) = 


Fase (n+3)'-F] 
cos} {n+ = )t—— orn —> oo. 
sint 2 4 


19.4 Expansions in Terms of Eigenfunctions 


19.3.2 Large Argument 


Liouville and Priifer substitutions are useful in investigating the behavior of 
the solutions of S-L systems for large x as well. The general procedure is to 
transform the DE into the form of Eq. (19.8) by the Liouville substitution; 
then make the Priifer substitution of (19.16) to obtain two DEs in the form 
of (19.17). Solving Eq. (19.17) when x — oo determines the behavior of 
¢ and R and, subsequently, of u, the solution. Problem 19.4 illustrates this 
procedure for the Bessel functions. We simply quote the results: 


h(x) = 2 oe mv /4 , 20) 
EN eg de Ne 2x Pe 

ra (Ve. P14 O(1) 
Yy(x) = Se sin] (o+5)5+ = |+ae. 


These two relations easily yield the asymptotic expressions for the Hankel 
functions: 


HY (x) = J,(x) +i¥,@) 


7 ra l\n .w-1/4 O(1) 
= = exo; (v+5)5+ ax ]}+Se. 


HY (x) = Jy(x) —i¥y (x) 


_ [2 1\n  v?-1/4 O(1) 
= = exp| ifs (o+5)5+ ax ]}+S- 


If the last term in the exponent—which vanishes as x —> oo—is ignored, the 
asymptotic expression for HS” (x) matches what was obtained in Chap. 16 
using the method of steepest descent. 


19.4 Expansions in Terms of Eigenfunctions 


Chapter 13 showed how the solution of many PDEs can be written as the 
product of the solutions of the separated ODEs. These DEs are usually of 
Sturm-Liouville type. We saw this in the construction of spherical harmon- 
ics. In the rest of this chapter, consisting mainly of illustrative examples, we 
shall consider the use of other coordinate systems and construct solutions to 
DEs as infinite series expansions in terms of S-L eigenfunctions. 

Central to the expansion of solutions in terms of S-L eigenfunctions is 
the question of their completeness. This completeness was established for a 
regular S-L system in Theorem 19.1.9. 

We shall shortly state an analogous theorem (without proof) that estab- 
lishes the completeness of the eigenfunctions of more general S-L systems. 
This theorem requires the following generalization of the separated and the 
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periodic BCs: 


Ryu =a, u(a) + ay2u' (a) + a13u(b) + ay4u'(b) = 0, 
; (19.24) 
Rou = a2) u(a) + ag2u (a) + a23u(d) + ag4u (db) = 0, 


where a;; are numbers such that the rank of the following matrix is 2: 


a= O11 G12 13 14 

21 G22 23 24 
The separated BCs correspond to the case for which a1; = a1, a2 = fi, 
23 = a2, and @24 = fo, with all other a;; zero. Similarly, the periodic BC 
is a special case for which a1; = —a13 = @22 = —a24 = 1, with all other 


aj; zero. It is easy to verify that the rank of the matrix a is 2 for these two 
special cases. Let 


U= {ue C*[a, b] |Rju =0, for j = 1,2} (19.25) 


be a subspace of £2 (a, b), and—to assure the vanishing of the RHS of the 
Lagrange identity—assume that the following equality holds: 


proyae (2 a2) = prayer (3 o, (19.26) 


21 22 23, O24 


We are now ready to consider the theorem (for a proof, see [Hell 67, 
Chap. 7]). 


(oe) 


Theorem 19.4.1 The eigenfunctions {uy(x)}°., of an S-L system 
consisting of the S-L equation (pu')’ + (Aw — q)u =0 and the BCs 
of (19.24) form a complete basis of the subspace U of £2,(a,b) de- 
scribed in (19.25). The eigenvalues are real and countably infinite and 
each one has a multiplicity of at most 2. They can be ordered accord- 
ing to size hy < Az <---, and their only limit point is +00. 


First note that Eq. (19.26) contains both separated and periodic BCs as 
special cases (Problem 19.5). In the case of periodic BCs, we assume that 
p(a) = p(b). Thus, all the eigenfunctions discussed so far are covered by 
Theorem 19.4.1. Second, the orthogonality of eigenfunctions corresponding 
to different eigenvalues and the fact that there are infinitely many distinct 
eigenvalues assure the existence of infinitely many eigenfunctions. Third, 
the eigenfunctions form a basis of U and not the whole £2 (a, b). Only 
those functions u € Lea, b) that satisfy the BC in (19.24) are expandable 
in terms of u,(x). Finally, the last statement of Theorem 19.4.1 is a repeti- 
tion of part of Theorem 19.1.9 but is included because the conditions under 
which Theorem 19.4.1 holds are more general than those applying to Theo- 
rem 19.1.9. 

Part II discussed orthogonal functions in detail and showed how other 
functions can be expanded in terms of them. However, the procedure used 
in Part II was ad hoc from a logical standpoint. After all, the orthogonal 
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Fig. 19.1 A rectangular conducting box of which one face is held at the potential f(x, y) 
and the other faces are grounded 


polynomials were invented by nineteenth-century mathematical physicists 
who, in their struggle to solve the PDEs of physics using the separation of 
variables, came across various ODEs of the second order, all of which were 
recognized later as S-L systems. From a logical standpoint, therefore, this 
chapter should precede Part II. But the order of the chapters was based on 
clarity and ease of presentation and the fact that the machinery of differential 
equations was a prerequisite for such a discussion. 

Theorem 19.4.1 is the important link between the algebraic and the an- 
alytic machinery of differential equation theory. This theorem puts at our 
disposal concrete mathematical functions that are calculable to any desired 
accuracy (on a computer, say) and can serve as basis functions for all the 
expansions described in Part I. The remainder of this chapter is devoted to 
solving some PDEs of mathematical physics using the separation of vari- 
ables and Theorem 19.4.1. 


19.5 Separation in Cartesian Coordinates 


Problems most suitable for Cartesian coordinates have boundaries with rect- 
angular symmetry such as boxes or planes. 


19.5.1 Rectangular Conducting Box 


Consider a rectangular conducting box with sides a, b, and c (see Fig. 19.1). 
All faces are held at zero potential except the top face, whose potential is 
given by a function f(x, y). Let us find the potential at all points inside the 
box. 

The relevant PDE for this situation is Laplace’s equation, V7 = 0. Writ- 
ing ®(x, y, z) as a product of three functions, ®(x, y, z) = X (x) Y(y)Z(), 
yields three ODEs (see Problem 19.6): 

d’x d°Y a’Z 


— +ax =0, — Y=0, —_~ Z=0, 19.27 
+ ree + a (19.27) 


where A. + ju + v =0. The vanishing of ® at x = 0 and x =a means that 
P(0,y,z)=XO)YQ)Z@)=0 Vy,z => X(0)=0, 
P(a,y,z)=X(a)YO)Z(z)=0 Vy,z => X(ay=0. 
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We thus obtain an S-L system, X” + AX =0, X(0) =0= X (a), whose BC 
is neither separated nor periodic, but satisfies (19.24) with a1; = a23 = | 
and all other a;; zero. This S-L system has the eigenvalues and eigenfunc- 


tions 
2 
nw _ (nt 
n= (=) and xy(0)=sin( x) forn=1,2,.... 
a a 


Similarly, the second equation in (19.27) leads to 


ma \* _ (mn 
n= (") and Ym(y) = sin( form =1,2,.... 


On the other hand, the third equation in (19.27) does not lead to an S-L 
system because the BC for the top of the box does not fit (19.24). This is 
as expected because the “eigenvalue” v is already determined by A and yj. 
Nevertheless, we can find a solution for that equation. The substitution 


2 na \? mm \? 
Ynn = a or “bh. 
changes the Z equation to Z” — y2,,Z = 0, whose solution, consistent with 
Z(0) =0, is Z(z) = Cnn Sinh(YnnZ). 
We note that X (x) and Y(y) are functions satisfying R} X = 0 = RoX. 


Thus, by Theorem 19.4.1, they can be written as a linear combination of 
Xn (x) and Yin (y): 


X(x)= > Ansin(nax/a) and Y(y)= >. Bm sin(mxb/y). 
n=1 m=1 


Consequently, the most general solution can be expressed as 


P(x, y,Z) =X (xX)¥(y)Z(z) 


Co wo 
, (nw j mit ; 
= > SS Amn sin( x) sin“ y) sinh(ynnZ); 
n=l1m=1 . 

where Amn = An BnCmn- 

To specify ® completely, we must determine the arbitrary constants Aj. 
This is done by imposing the remaining BC, ®(x, y,c) = f(x, y), yielding 
the identity 
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where Byyy = Amn Sinh(Ynnc). This is a two-dimensional Fourier series (see 
Chap. 9) whose coefficients are given by 


4 f¢ b _ (nn _ (ma 
Bnn=— | dx | dyf (x, y)sin{ —x }sin{ —y }. 
ab 0 0 a b 


Historical Notes 

Pierre Simon de Laplace (1749-1827) was a French mathematician and theoretical as- 
tronomer who was so famous in his own time that he was known as the Newton of France. 
His main interests throughout his life were celestial mechanics, the theory of probability, 
and personal advancement. 

At the age of 24 he was already deeply engaged in the detailed application of Newton’s 
law of gravitation to the solar system as a whole, in which the planets and their satel- 
lites are not governed by the sun alone, but interact with one another in a bewildering 
variety of ways. Even Newton had been of the opinion that divine intervention would oc- 
casionally be needed to prevent this complex mechanism from degenerating into chaos. 
Laplace decided to seek reassurance elsewhere, and succeeded in proving that the ideal 
solar system of mathematics is a stable dynamical system that will endure unchanged for 
all time. This achievement was only one of the long series of triumphs recorded in his 
monumental treatise Mécanique Céleste (published in five volumes from 1799 to 1825), 
which summed up the work on gravitation of several generations of illustrious mathemati- 
cians. Unfortunately for his later reputation, he omitted all reference to the discoveries of 
his predecessors and contemporaries, and left it to be inferred that the ideas were entirely 
his own. Many anecdotes are associated with this work. One of the best known describes 
the occasion on which Napoleon tried to get a rise out of Laplace by protesting that 
he had written a huge book on the system of the world without once mentioning God 
as the author of the universe. Laplace is supposed to have replied, “Sire, I had no need 
of that hypothesis.” The principal legacy of the Mécanique Céleste to later generations 
lay in Laplace’s wholesale development of potential theory, with its far-reaching impli- 
cations for a dozen different branches of physical science ranging from gravitation and 
fluid mechanics to electromagnetism and atomic physics. Even though he lifted the idea 
of the potential from Lagrange without acknowledgment, he exploited it so extensively 
that ever since his time the fundamental equation of potential theory has been known as 
Laplace’s equation. After the French Revolution, Laplace’s political talents and greed for 
position came to full flower. His compatriots speak ironically of his “suppleness” and 
“versatility” as a politician. What this really means is that each time there was a change 
of regime (and there were many), Laplace smoothly adapted himself by changing his 
principles—back and forth between fervent republicanism and fawning royalism—and 
each time he emerged with a better job and grander titles. He has been aptly compared 
with the apocryphal Vicar of Bray in English literature, who was twice a Catholic and 
twice a Protestant. The Vicar is said to have replied as follows to the charge of being 
a turncoat: “Not so, neither, for if I changed my religion, I am sure I kept true to my 
principle, which is to live and die the Vicar of Bray.” 

To balance his faults, Laplace was always generous in giving assistance and encourage- 
ment to younger scientists. From time to time he helped forward in their careers such men 
as the chemist Gay-Lussac, the traveler and naturalist Humboldt, the physicist Poisson, 
and—appropriately—the young Cauchy, who was destined to become one of the chief 
architects of nineteenth century mathematics. 


19.5.2 Heat Conduction in a Rectangular Plate 


Consider a rectangular heat-conducting plate with sides of length a and b all 
held at T = 0. Assume that at time t = 0 the temperature has a distribution 
function f (x, y). Let us find the variation of temperature for all points (x, y) 
at all times ¢ > 0. 
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The diffusion equation for this problem is 


oT OT oer 

—=PWT =k | +, }. 

ot ax2 dy? 

A separation of variables, T(x, y, t) = X(x)Y(y)g(t), leads to three DEs: 


d’x d’y dg 
ese Vp ens i ——+pyY=0, So Eth =0. 
at 5 +p rae A+ wg 


The BCs T(0, y, t) = T(a, y,t) = T(x, 0,t) = T(x, b, t) = 0, together with 
the three ODEs, give rise to two S-L systems. The solutions to both of these 
are easily found: 


2 
n= (=) and Xy(x) =sin( x) forn=1,2,..., 
a a 


mm \* _ (ma 
bin 7 and Y,,(y) =sin ae form=1,2,.... 


These give rise to the general solutions 
= nw = mi 
A@= ts sin( x), Y(y)= 2 Bn sin y), 
n= m= 


With yinn = kx? (n?/a* + m*/b*), the solution to the g equation can be 
expressed as g(t) = Crnne ’"'. Putting everything together, we obtain 


(oe) (oe) 
a a 
POs. 9.= Dy Yo Ane?! sin( ) sin( = y), 
a 
n=lm=1 


where Ayn = An Bm Cmn is an arbitrary constant. To determine it, we impose 
the initial condition T(x, y,0) = f(x, y). This yields 


CO CO 
f@, y= 2,2, Am sin( x) sin(“y), 


which determines the coefficients Aj: 
4 ¢ » . (nw _ (mn 
Amv=— | dx | dyf(x,y)sin{ —x )sin{ —y }. 
ab 0 0 a b 


19.5.3 Quantum Particle in a Box 


The behavior of an atomic particle of mass jz confined in a rectangular box 
with sides a, b, and c (an infinite three-dimensional potential well) is gov- 
erned by the Schrédinger equation for a free particle, 


aw “(3 aw *t) 
dt = Aw\ dx? Ay? az? J’ 
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and the BC that w(x, y, z, t) vanishes at all sides of the box for all time. 


A separation of variables w(x, y,z,t) = X(x)Y(y)Z(z)T(t) yields the 
ODEs 


d2Xx d2Y Z 
—— +4aX=0, —_ +oY=0, —~ +X =0, 
x y Zz 


dT A 
—-+ioT=0, where w=—(A+o+D). 
dt 2m 


The spatial equations, together with the BCs 
vVOy20=vayzH=0 => X0)=0=X@), 
W(x, 0,2, =W@,b,z,)=0 => YO)=0=Y(d), 
va,y,0H=Wa,y,en=0 = Z0)=0=Z(c), 


lead to three S-L systems, whose solutions are easily found: 


2 
Xy(x) =sin(“Z,), = (=) 2 FRAT ad 
a a 


_ (mn ms \" 
¥m(y)=sin(™y), on = (=) , form=1,2,..., 


. (la In \? 
Zi(z) = sin{ —z }, y=(—), for/=1,2,.... 
Cc Cc 


The time equation, on the other hand, has a solution of the form 


T(t) =Cimne 2" ~— where 


h[T(nx\? " mr \* " In \? 
w, — — — — : 
ven 2m a b Cc 
The solution of the Schrédinger equation that is consistent with the BCs is 


therefore 


(oe) 
: 1 
W(x, y,Z,0= ys Aimne (Cum sin( x) sin("y) sin( 2). 


I,m,n=1 


The constants Ajj, are determined by the initial shape, w(x, y, z, 0) of the 
wave function. The energy of the particle is 


ha? (n> mI? 
E=hoimn = 2 (5 + pe + =): 


Each set of three positive integers (n,m, 1) represents a state of the particle. 
For a cube, a= b =c = L, and the energy of the particle is 


hen? hen? 


= sane” tm +0) = save” tm +0) (19.28) 
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where V = L? is the volume of the box. The ground state is (1, 1, 1), has en- 
ergy E = 3h?2?/2uV7/?, and is nondegenerate (only one state corresponds 
to this energy). However, the higher-level states are degenerate. For instance, 
the three distinct states (1, 1,2), (1, 2, 1), and (2, 1, 1) all correspond to the 
same energy, E = 6h?2*/2uV7/>. The degeneracy increases rapidly with 
larger values of n, m, and I. 

Equation (19.28) can be written as 


QwEV?/3 


ntm+P= R?, where R* = a2 


This looks like the equation of a sphere in the nml-space. If R is large, the 
number of states contained within the sphere of radius R (the number of 
states with energy less than or equal to E) is simply the volume of the first 
octant® of the sphere. If N is the number of such states, we have 


1 (40 m (2wEV23\3 x ( 2wE\?” 
N=-(—)R= = V 
8\ 3 6 h2n2 6 \ h2n2 


Thus the density of states (the number of states per unit volume) is 


N «(2m \3? 
n=7=2(55) E3/2, (19.29) 


This is an important formula in solid-state physics, because the energy E is 
(with minor modifications required by spin) the Fermi energy. If the Fermi 
energy is denoted by E, Eq. (19.29) gives Ef = an*/?, where a is some 
constant. 


19.5.4 Wave Guides 


In the preceding examples the time variation is given by a first deriva- 
tive. Thus, as far as time is concerned, we have a FODE. It follows that 
the initial specification of the physical quantity of interest (temperature T 
or Schrédinger wave function y) is sufficient to determine the solution 
uniquely. 

A second kind of time-dependent PDE occurring in physics is the wave 
equation, which contains time derivatives of the second order. Thus, there 
are two arbitrary parameters in the general solution. To determine these, we 
expect two initial conditions. For example, if the wave is standing, as in 
a rope clamped at both ends, the boundary conditions are not sufficient to 
determine the wave function uniquely. One also needs to specify the initial 
(transverse) velocity of each point of the rope. 

For traveling waves, specification of the wave shape and velocity shape 
is not as important as the mode of propagation. For instance, in the theory of 
wave guides, after the time variation is separated, a particular time variation, 


8This is because n, m, and / are all positive. 
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such as et!', and a particular direction for the propagation of the wave, say 
the z-axis, are chosen. Thus, if u denotes a component of the electric or the 
magnetic field, we can write 


u(x, y,z,H=wWt, ye! (otstkz) | 


where k is the wave number. The wave equation then reduces to 


2 2 yp 
ay ae s+(4 e)y=0. 


ax2 ay? 2 


Introducing y? = w*/c” — k* and the transverse gradient V; = (0/dx, 3/dy) 
and writing the above equation in terms of the full vectors, we obtain 


(72 +7?) {pt =o. where es COG saa (19.30) 


These are the basic equations used in the study of electromagnetic wave 
guides and resonant cavities. 
Maxwell’s equations in conjunction with Eq. (19.30) gives the transverse guided waves 
components (components perpendicular to the propagation direction) E;, 
and B; in terms of the longitudinal components FE and B, (see [Lorr 88, 
Chap. 33]): 


2 OE, WO, 
y-E; =V;| —— } —i-e; x (V;B;), 
c 
(19.31) 


2 OB, WO, 
y°B, =V; +i—e, x (V;E,). 
Oz Cc 


Three types of guided waves are usually studied. 


1. Transverse magnetic (TM) waves have B, = 0 everywhere. The BC on 
E demands that FE, vanish at the conducting walls of the guide. 

2. Transverse electric (TE) waves have E, = 0 everywhere. The BC on B 
requires that the normal directional derivative 


ons =6,-(VB-) 

on 
vanish at the walls. 

3. Transverse electromagnetic (TEM) waves have B, = 0 = E,. For a 
nontrivial solution, Eq. (19.31) demands that y? = 0. This form re- 
sembles a free wave with no boundaries. 


We quote the basic equations for the TM mode (see any book on electro- 
magnetic theory for further details): 


(V; + y*)Ez =0, B. =0, 


JE 
y°E; = v,/( “ 


(19.32) 
WO, 
al y°B, =i—€. x (ViE:), 


where V2 is restricted to the x and y terms of the Laplacian in Cartesian 
coordinates, and to the p and @ terms in cylindrical coordinates. 
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Fig. 19.2 A conducting cylindrical can whose top has a potential given by V(p, g), with 
the rest of the surface grounded 


19.6 Separation in Cylindrical Coordinates 


When the geometry of the boundaries is cylindrical, the appropriate coordi- 
nate system is the cylindrical one. This usually leads to Bessel functions “of 
some kind.” 

Before working specific examples of cylindrical geometry, let us con- 
sider a question that has more general implications. We saw in the previous 
section that separation of variables leads to ODEs in which certain constants 
(eigenvalues) appear. Different choices of signs for these constants can lead 
to different functional forms of the general solution. For example, an equa- 
tion such as d?x/dt? — kx = 0 can have exponential solutions if k > 0 or 
trigonometric solutions if k < 0. One cannot a priori assign a specific sign 
to k. Thus, the general form of the solution is indeterminate. However, once 
the boundary conditions are imposed, the unique solutions will emerge re- 
gardless of the initial functional form of the solutions (see [Hass 08] for a 
thorough discussion of this point). 


19.6.1 Conducting Cylindrical Can 


Consider a cylindrical conducting can of radius a and height h (see 
Fig. 19.2). The potential varies at the top face as V(p,@), while the lat- 
eral surface and the bottom face are grounded. Let us find the electrostatic 
potential at all points inside the can. 

A separation of variables transforms Laplace’s equation into three ODEs: 


=( —)+(# "Ye ' 
7. P= p- — = ’ 
do’ dp p 

2 

2 2 
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where in anticipation of the correct BCs, we have written the constants as k? 
and —m? with m an integer. The first of these is the Bessel equation, whose 
general solution can be written as R(p) = AJ (kp) + BY im (ko). The second 
DE, when the extra condition of periodicity in g is imposed on the potential, 
has the general solution 


S(g) = Ccosmg + Dsinmg. 
Finally the third DE has a general solution of the form 
Z(z) = Ee + Fe-™., 


We note that none of the three ODEs lead to an S-L system of Theo- 
rem 19.4.1 because the BCs associated with them do not satisfy (19.24). 
However, we can still solve the problem by imposing the given BCs. 

The fact that the potential must be finite everywhere inside the can (in- 
cluding at ¢ = 0) forces B to vanish because the Neumann function Y,, (kp) 
is not defined at o = 0. On the other hand, we want @ to vanish at p =a. 
This gives Jj, (ka) = 0, which demands that ka be a root of the Bessel func- 
tion of order m. Denoting by x» the nth zero of the Bessel function of order 
m, we have ka = Xn, Or kK = Xpp/a forn =1,2,.... 

Similarly, the vanishing of ® at z = 0 implies that 


XmnZ& 


E=- and Z(c) = Esinh( =). 
a 


We can now multiply R, S, and Z and sum over all possible values of m 
and n, keeping in mind that negative values of m give terms that are linearly 
dependent on the corresponding positive values. The result is the so-called 


Fourier-Bessel series: Fourier-Bessel series 
[oe] lo) x x 
@ ; , = J mn . h “mn 
(p, 9, 2) YD Jn( 0) sin 7 
m=0n=1 
X (Amn cosm@ + Byy sinm@), (19.33) 


where Amy and By are constants to be determined by the remaining BC. 
To find these constants we use the orthogonality of the trigonometric and 
Bessel functions. For z = h Eq. (19.33) reduces to 


CO CO 
Vp, gQ= ». > In( 0) sinh (“2 


m=0n=1 


n) (Amn cosm@ + Binnn sinm@), 


from which we obtain 


2 


ma? J. \(Xmn) Sinhmnh/a) 


20 a Xmn 
ai ay | dppV(p.®)n( cosmg, 
0 0 


Amn = 


588 


19 Sturm-Liouville Systems 


2 
© a2 J (Xmn) sinh(Xmnh/a) 


is 4 Xmn : 
x dg | dppV(e,e)Jn og ee 
0 0 


where we have used the following result derived in Problem 15.39: 


mn 


2 2 [{ Xmn a? 2 
: PIm\ —— 0 ) de = > Jing mn): (19.34) 


For the special but important case of azimuthal symmetry, for which V 
is independent of g, we obtain 


ee 46,0 [aoevivr4n( 0) 
 a2J?(xon) sinh(xonh/a) Jo ras 


Binn = 0, 


and 
Jo( 28 p) sinh) 
" I? (xon) sinh(xonh/a)’ 


4 oo 
@ => A 
(0,2) pas 


n=1 


a XOn 
An= | do pV(p)Jol| —p}], 
0 a 


and V (p) is the g-independent potential at the top face. 

The reason we obtained discrete values for k was the demand that ® 
vanish at p = a. If we let a > o, then k will be a continuous variable, 
and instead of a sum over k, we will obtain an integral. This is completely 
analogous to the transition from a Fourier series to a Fourier transform, but 
we will not pursue it further. 


where 


19.6.2 Cylindrical Wave Guide 


For a TM wave propagating along the z-axis in a hollow circular conductor, 
we have [see Eq. (19.32)] 


14a ( dE, Oey. 33 
E,=0. 
sla) te age : 


The separation E, = R(p)S(@) yields S(g) = Acosmg + B sinm@ and 


| 5 mR ‘ 
y =0. 
dp? p dp p? 


The solution to this equation, which is regular at » = 0 and vanishes at 
p=a,is 


R(p) = Cn (= ») and y= alia . 
a a 
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Recalling the definition of y, we obtain 


2 2 2 2 
(a) X @ Xx 

2 ke “ : a k fs om : 
Cc a Cc a 


This gives the cut-off frequency @mny = CXmn/a. 
The solution for the azimuthally symmetric case (m = 0) is 


lo) 
Ex(0.9.1)= Ando “Sty )etorh and B,=0, 


n=1 


where ky = ,/@?/c? — x6, /a?. 


19.6.3 Current Distribution in a Circular Wire 


There are many variations on the theme of Bessel functions. We have en- 
countered three kinds of Bessel functions, as well as modified Bessel func- 
tions. Another variation encountered in applications leads to what are known 
as Kelvin functions, introduced here. 

Consider the flow of charges in an infinitely long wire with a circular 
cross section of radius a. We are interested in calculating the variation of 
the current density in the wire as a function of time and location. The rel- 
evant equation can be obtained by starting with Maxwell’s equations for 
negligible charge density (V -E = 0), Ohm’s law (j = oc E), the assumption 
of high electrical conductivity (|oE| >> |dE/dt|), and the usual procedure 
of obtaining the wave equation from Maxwell’s equations. The result is 


Vj- oe a, 
c- Ot 
Moreover, we make the simplifying assumptions that the wire is along the 
z-axis and that there is no turbulence, so j is also along the z direction. We 
further assume that j is independent of ¢g and z, and that its time-dependence 
is given by e~!“'. Then we get 


43 
Sh EE Se ph, (19.35) 
fe) 


where t? = i410. w/c? = i2/65* and 8 = c//20@ is called the skin depth. 
The Kelvin equation is usually given as 


+—— —ik*w=0. (19.36) 
XxX 


If we substitute x = Jit/k, it becomes t) + w/t + w =0, which is a Bessel 
equation of order zero. If the solution is to be regular at x = 0, then the 
only choice is w(t) = Jo(t) = Jo(e7'7/4kx). This is the Kelvin function 


skin depth 
Kelvin equation 


Kelvin function 
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19 Sturm-Liouville Systems 
for Eq. (19.36). It is usually written as 
Jo(e'7/*#kx) = ber(kx) + i bei(kx) 


where ber and bei stand for “Bessel real” and “Bessel imaginary”, respec- 
tively. If we substitute z = e~!*/*kx in the expansion for Jo(z) and separate 
the real and the imaginary parts of the expansion, we obtain 


(x/2)* | @/2)8 
Qn?" (4p? 


Gey. tea. G2)” 
ay? = G2 (5!)2 


ber(x) = 1 


bei(x) = 


Equation (19.35) is the complex conjugate of (19.36) with k* = 2/67. 
Thus, its solution is 


j(p) = Ajo (e'"/4kp) = Afber(~n) ivei( >) | 


We can compare the value of the current density at with its value at the 
surface p =a: 


ne 


Bases doit dg 
i@|— 


ber? (2a) + bei2(¥2a) 


For low frequencies, 5 is large, which implies that o/éd is small; thus, 
ber(/20/5) © 1 and bei(/2/5) ~ 0, and | j(p)/j(a)| © 1; iLe., the current 
density is almost uniform. For higher frequencies the ratio of the current 
densities starts at a value less than | at o = 0 and increases to | at p =a. 
The starting value depends on the frequency. For very large frequencies the 
starting value is almost zero (see [Mari 80, pp 150—156]). 


19.7. Separation in Spherical Coordinates 


Recall that most PDEs encountered in physical applications can be sepa- 
rated, in spherical coordinates, into 


L?Y(6,~) =I + 1)Y(6,9), 


2R (19.37) 
$2 fro Hae 


We discussed the first of these two equations in great detail in Chap. 13. 
In particular, we constructed Yj,.,(0,@) in such a way that they formed 
an orthonormal sequence. However, that construction was purely algebraic 
and did not say anything about the completeness of Y),,(0, p). With Theo- 
rem 19.4.1 at our disposal, we can separate the first equation of (19.37) into 


19.7. Separation in Spherical Coordinates 
two ODEs by writing Yim(9, ¢) = Pim(@)Sm(g). We obtain 


d?Sm 


dg + m Sin =0, 


d d Pim m2 
1—x? 1 + 1) — ——~ | Pim = 0, 
ral Fe] + [ree ad o 


where x = cos@. These are both S-L systems satisfying the conditions 
of Theorem 19.4.1. Thus, the S,, are orthogonal among themselves and 
form a complete set for L7(0, 27). Similarly, for any fixed m, the Pi (x) 
form a complete orthogonal set for £7(—1, +1) (actually for the subset of 
L£2(-1, +1) that satisfies the same BC as the P),, do at x = £1). Thus, the 
products Yim(x, ¢) = Pim(x)Sm(g) form a complete orthogonal sequence in 
the (Cartesian product) set [—1, +1] x [0, 277], which, in terms of spherical 
angles, is the unit sphere,0 <9 <7,0<9@<2z. 


19.7.1 Radial Part of Laplace’s Equation 


Let us consider some specific examples of expansion in the spherical coor- 
dinate system starting with the simplest case, Laplace’s equation for which 
f(r) =0. The radial equation is therefore 


d*R 2dR_ 1(l+ D, 


= 0. 
dr2 r dr r2 


Multiplying by r’, substituting r = e’, and using the chain rule and the fact 
that dt/dr = 1/r leads to the following SOLDE with constant coefficients: 


d’R dR 

— +—-/7+1)R=0. 

dt dt ( ) 
This has a characteristic polynomial p(A) = w2 +4 —10 + 1) with roots 
A, =l and A2 = —(/ + 1). Thus, a general solution is of the form 


R(t) = Ae*"! + Be?! = A(e)! + Bea 
or, in terms of r, R(r) = Ar! + Br7!-}, Thus, the most general solution of 
Laplace’s equation is 


lee) I 


®(r,6,9)= > D> (Atm! + Bint") ¥im(@, 9). 


1=0 m=—1 


For regions containing the origin, the finiteness of ® implies that 
Bim = 0. Denoting the potential in such regions by ®j,, we obtain 


[ee 


I 
Pin(r,0,.9)= >, >> Aimr' Yin. 9). 


1=0 m=—I 
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19 Sturm-Liouville Systems 
Similarly, for regions including r = 00, we have 
lee) I 
Dou’, 8, 9) = 2 +S Bin ©" Van @: g). 
1=0 m=-—1 
To determine Aj, and By,, we need to invoke appropriate BCs. In par- 


ticular, for inside a sphere of radius a on which the potential is given by 
V(6, 9), we have 


CO 


1 
V(6,9) = Fina, 0,9) = > D> Atma’ Yim (8,9). 
1=0 m=-—1 


Multiplying by Y, ei (6, g) and integrating over d{2 = sin dé dg, we obtain 


Akj =a ff anve.erie.0) 
=> Aim =a" ffanve. g)Y;.,(0, 9). 
Similarly, for potential outside the sphere, 
Bim =a!) J eove. g)Y;,(0, 9). 


In particular, if V is independent of g, only the components for which 
m = 0 are nonzero, and we have 


20 f-. a 
Ajo = = | sin6 V (6) Yj) (8) dé 
a 0 


an [+1 f* 
= (= | sin 0 V (0) P;(cos6) d0, 
a 0 


which yields 

00 Xi 

Pi(r,0) =) ai(£) Pi(cos 8), 

1=0 w 

where 
2. as 
Aj = ah sind V (0) Pi(cos@) dé. 

Similarly, 


00 a\ lt! 
Pou (7, 0) = yai(2) P;(cos@). 
1=0 


19.7. Separation in Spherical Coordinates 
19.7.2, Helmholtz Equation in Spherical Coordinates 


The next simplest case after Laplace’s equation is that for which f(r) is a 
constant. The diffusion equation, the wave equation, and the Schrédinger 
equation for a free particle give rise to such a case once time is separated 
from the rest of the variables. 

The Helmholtz equation is 


Vwtkw=0, (19.38) 
and its radial part is 
d*R 2dR id+1) 
he R=0. 19.39 
dr? = r dr +| r2 ( ) 


(This equation was discussed in Problems 15.26 and 15.35.) The solutions 
are spherical Bessel functions, generically denoted by the corresponding 
lower case letter as z;(x) and given by 


m Zi+1/2(X) 


= ——, 19.40 
zi (x) oa ( ) 
where Z, (x) is a solution of the Bessel equation of order v. 
A general solution of (19.39) can therefore be written as 
Ry(r) = Aj (kr) + By) (kr). (19.41) 


If the origin is included in the region of interest, then we must set B = 0. 
For such a case, the solution to the Helmholtz equation is 


lee) I 
WKe.9.9) = > > Aim jkr) Vim. 9). (19.42) 


1=0 m=—1 


The subscript k indicates that y is a solution of the Helmholtz equation with 
k* as its constant. 


19.7.3 Quantum Particle in a Hard Sphere 


The time-independent Schrédinger equation for a particle in a sphere of 
radius a is -Evy = Ew with the BC w(a,0,g) = 0. Here E is the 
energy of the particle and ju is its mass. We rewrite the Schrédinger equation 


as V-w +k? =0 with k* = 2uE/h*. Then Eq. (19.41) and the fact that 
Ri (r) must be finite at r = 0 yield 


Ri(r) = Ajilkr) = AjiG/ 2uEr/h). 
The vanishing of w at a implies that j)(./2Ea/h) =0, or 


V2uEa 
h 


=X, forn=1,2,..., 


Helmholtz equation 


spherical Bessel 
functions 
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where Xj, is the mth zero of jj(x), which is the same as the zero of 
Ji+1/2(x). Thus, the energy is quantized as 


2y2 


WX 
Figs SOHO Aig HS Les: 

2 

2ua 


The general solution to the Schrddinger equation is 


lo omme.¢) I 
¥r4~9=> >>> >. Anindi( Xin” ) Vint 


n=1 1=0 m=—l 


19.7.4 Plane Wave Expansion 


A particularly useful consequence of Eq. (19.42) is the expansion of a plane 
wave in terms of spherical Bessel functions. It is easily verified that if k is 
a vector, with k- k = k?, then e’*¥ is a solution of the Helmholtz equation. 
Thus, e’*¥ can be expanded as in Eq. (19.42). Assuming that k is along the 
Z-axis, we get k- r = kr cos@, which is independent of g. Only the terms of 
Eq. (19.42) for which m = 0 will survive in such a case, and we may write 


(oe) 
ef kreosO —S ™ Ay jy(kr) Pi(cos6). 
1=0 


To find A;, let u = cos6, multiply both sides by P,,(w), and integrate from 


—ltol: 

1 : oo 1 2 
[Patent Yanan | Bi ess era 
Thus 

m+1 f 
Anjntkr) == fo Py(wpel* du 
—1 
In+1 GQ (kr) 7} 
= > = [/, Petouan. (19.43) 
m=0 


This equality holds for all values of kr. In particular, both sides should give 

the same result in the limit of small kr. From the definition of j, (kr) and 

the expansion of J, (kr), we obtain 

Ja (kr\" 1 
2) Tin+3/2) 


iy (k 
Jn( my rae 2 


ik-r j, On the other hand, the first nonvanishing term of the RHS of Eq. (19.43) 

occurs when m = n. Equating these terms on both sides, we get 

Ja (kr\"  2?"*Inl In + 1 iP (kr)" 241 (nl)? 
2\2) Qn4+)!/r 2 n! (2n+1)!’ 


expansion of e 
spherical harmonics 


An (19.44) 


19.8 Problems 


where we have used 


3 (Qn+1)!./a ! ‘ antl? 
r(n + 5) eae rae and a P,(u)ju"du= ne DI 


Equation (19.44) yields A, = i" (2n + 1). 
With A,, thus calculated, we can now write 


oo 
eikr cos _ 2 + 1)i! j)(kr) P;(cos 6). (19.45) 
1=0 


For an arbitrary direction of k, k-r = kr cosy, where y is the angle between 
k and r. Thus, we may write 


(oe) 


ekr = S21 + bil j(kr) Pi(cosy), 
1=0 


and using the addition theorem for spherical harmonics [Eq. (13.44)], we 
finally obtain 


oo I 
ek An) SO i j(kr) Yi, (0. ¢') Yin (6,9), (19.46) 
1=0 m=—I 


where 0’ and g’ are the spherical angles of k and @ and ¢ are those of r. 
Such a decomposition of plane waves into components with definite orbital 
angular momenta is extremely useful when working with scattering theory 
for waves and particles. 


19.8 Problems 


19.1 Show that the Liouville substitution transforms regular S-L systems 
into regular S-L systems and separated and periodic BCs into separated and 
periodic BCs, respectively. 


19.2 Let uw; (x) and u2(x) be transformed, respectively into vj (t) and v2(t) 
by the Liouville substitution. Show that the inner product on [a, b] with 
weight function w(x) is transformed into the inner product on [0, c] with 


unit weight, where c = _ Jw] pdx. 
19.3 Derive Eq. (19.17) from (19.15) using Priifer substitution. 


19.4 Consider the Bessel DE. 


(a) Show that the Liouville substitution transforms the Bessel DE into 


a 41/4 
r+(" 7 v=o. 


dt? 
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(b) Find the equations obtained from the Priifer substitution, and show 
that for large x these equations reduce to 


=K(1 a )+ O(1) R’ _ oO 


2k? x2 x3 R x3 
where a = v* — i 
(c) Integrate these equations from x to b > x and take the limit as b > oo 
to get 
a oC od 
separ 4, Rejoens —, 
2kx x x 
where doo = limp_+o0(@(b) — kb) and Roo = limp-+o0 R(d). 
(d) Substitute these and the appropriate expression for Q~'/4 in Eq. (19.16) 
and show that 


(x) Roo kx — kXxoo + 
v(x) = —=cos| kx Xoo 
Vk 


where kigg = 1/2 — doo. 
(e) Choose Ry =./2/z for all solutions of the Bessel DE, and let 


1\z 3\ 2 
KxXoo = (v+ 35 and Kxo9 = (v+ 5)5 


for the Bessel functions J,,(x) and the Neumann functions Y,,(x), re- 
spectively, and find the asymptotic behavior of these two functions. 


v*—1/4 O(1) 
2kx )+ 


x2” 


19.5 Show that separated and periodic BCs are special cases of the equality 
in Eq. (19.26). 


19.6 Derive Eq. (19.27). 


19.7 A semi-infinite heat-conducting plate of width b is extended along the 
positive x-axis with one corner at (0,0) and the other at (0, b). The side 
of width b is held at temperature T = f(y), and the two long sides are 
held at T = 0. The two flat faces are insulated and the plate is in thermal 
equilibrium. 


(a) Find the temperature variation of the plate for all (x, y). 
(b) Specialize to the case where the side of width b is held at constant 
temperature 7p (see Fig. 19.3). 


19.8 Repeat Problem 19.7 with the temperature of the short side held at 
each of the following: 


oe 0 if0<y<b/2, Tp 
~ |% ifb/2<y<b. b> 


(c) Tycos( 7). O<y<b. (d) tysin($y), O<y<b. 


19.8 Problems 


y 
T=0 
ry 
b To 
a > 


T=f() 


0° 
a —— > 


Fig. 19.4 A heat-conducting rectangular plate 


19.9 Find a general solution for the electromagnetic wave propagation in 
a resonant cavity, a rectangular box of sides 0 < x <a, 0< y <b, and 
0 < z <d with perfectly conducting walls. Discuss the modes the cavity 
can accommodate. 


19.10 The lateral faces of a cube are grounded, and its top and bottom faces 
are held at potentials f\(x, y) and f2(x, y), respectively. 


(a) Find a general expression for the potential inside the cube. 
(b) Find the potential if the top is held at Vo volts and the bottom at — Vo 
volts. 


19.11 Find the potential inside a semi-infinite cylindrical conductor, closed 
at the nearby end, whose cross section is a square with sides of length a. 
All sides are grounded except the square side, which is held at the constant 
potential Vo. 


19.12 Consider a rectangular heat-conducting plate with sides of lengths 
a and b. Three of the sides are held at T = 0, and the fourth side has a 
temperature variation T = f (x) (see Fig. 19.4). The flat faces are insulated, 
so they cannot lose heat to the surroundings. Assume a steady-state heat 
transfer. 


(a) Show that the separation of variables yields 


. (na ; ni 
Xy(x) =sin( 2), ¥()=sinh(“y) forn=1,2,... 
a a 
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(b) Show that the most general solution is 


(oe) 


_ (nw _. (nx 
T(x, y)=X(X)YO) = pa B, sin( =x) sin(“y) 


n=1 


with 


4= 2 a (nn d 
"= cameabo I sin("™ x) Fox) - 


(c) Show that if the fourth side is held at the constant temperature 7p, then 
we obtain 


Poe wy AOS _1_ sink Qk + Darx/a] sinh[2k + Dry/a] 
@N= TDL 2k+1 sinh[ (2k + lb/a] 


k=0 
(19.47) 


(d) If the temperature variation of the fourth side is of the form f(x) = 
To sin(x /a), then 
sin(zx /a) sinh(z y/a) 


T(x, y) =To ahha) : (19.48) 


19.13 Find the temperature distribution of a rectangular plate (see Fig. 19.4) 
with sides of lengths a and b if three sides are held at T = 0 and the fourth 
side has a temperature variation given by 


To To 

(a) —x, O<x <a. (b) —z x(x — a), O<x <a. 
a a 
To a 

(c) — =F O<x <a. (4d) T=0, O<x<a. 


19.14 Consider a thin heat-conducting bar of length b along the x-axis with 
one end at x = 0 held at temperature 7p and the other end at x = b held at 
temperature —7o. The lateral surface of the bar is thermally insulated. Find 
the temperature distribution at all times if initially it is given by 


2To 
(a) T(O,x)= a +7o, whereO<x <b. 
2To 9 
(b) T(O,x)= —zr* +70, whereOQ<x <b. 
To 
(c) TO,x)= cans +79, whereO<x <b. 
(d) T(0,x)=T% cos( Fx), where 0 <x <b. 


Hint: The solution corresponding to the zero eigenvalue is essential and can- 
not be excluded. 


19.15 Determine T(x, y,t) for the rectangular plate of Sect. 19.5.2 if ini- 
tially the lower left quarter is held at Tp and the rest of the plate is held at 
T =0. 


19.8 Problems 


19.16 All sides of the plate of Sect. 19.5.2 are held at T = 0. Find the 
temperature distribution for all time if the initial temperature distribution 
is given by 

To if ja<x< za and 


a) T(x, y,0)= 
(a) iy) 0 otherwise. 


‘ip 
(b) T(x, y,0)=—Pxy, whereO<x<a and O<y<b. 
a 


Ti 
(c) T(x, y,0) = 2x, whereO<x<a and O<y<b. 
a 


19.17 Repeat the example of Sect. 19.5.2 with the temperatures of the sides 
equal to T;, T2, 73, and 74. Hint: You must include solutions corresponding 
to the zero eigenvalue. 


19.18 A string of length a is fixed at the left end, and the right end moves 
with displacement A sinwrt. Find w(x, t) and a consistent set of initial con- 
ditions for the displacement and the velocity. 


19.19 Find the equation for a vibrating rectangular membrane with sides of 
lengths a and b rigidly fastened on all sides. For a = b, show that a given 
mode frequency may have more than one solution. 


19.20 Repeat the example of Sect. 19.6.1 if the can has semi-infinite length, 
the lateral surface is grounded, and 


(a) _ the base is held at the potential V(:, ¢). 
(b) Specialize to the case where the potential of the base is given—in 
Cartesian coordinates—by 


vi vi 1 
@ Vay @ V=—x. Gi) V=—ay. 
a a a 
Hint: Use the integral identity f z’*!J,(z)dz=z"t! yai(z). 


19.21 Find the steady-state temperature distribution T(p, gy, z) in a semi- 
infinite solid cylinder of radius a if the temperature distribution of the base 
is f(e, @) and the lateral surface is held at T = 0. 


19.22 Find the steady-state temperature distribution of a solid cylinder with 
a height and radius of a, assuming that the base and the lateral surface are 
at T = 0 and the top is at Jo. 


19.23 The circumference of a flat circular plate of radius a, lying in the xy- 
plane, is held at T = 0. Find the temperature distribution for all time if the 
temperature distribution at tf = 0 is given—in Cartesian coordinates—by 


ii if i 
@ “yo Sx. © xy. @ fh. 
a a a 
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19.24 Find the temperature of a circular conducting plate of radius a at all 
points of its surface for all time t > 0, assuming that its edge is held at T =0 
and initially its surface from the center to a/2 is in contact with a heat bath 
of temperature 7p. 


19.25 Find the potential of a cylindrical conducting can of radius a and 
height h whose top is held at a constant potential Vo while the rest is 
grounded. 


19.26 Consider a wave guide with a rectangular cross section of sides a and 
b in the x and the y directions, respectively. 


(a) Show that the separated DEs have the following solutions: 


with Ye, =n t+ Um. 
(b) Using the fact that the wave number must be real, show that there is a 
cutoff frequency given by 


nx \* mm \* 
@mn =C —]}] +(— form,n> 1. 
a b 


(c) Show that the most general solution for E, is therefore 


[o,@) 
ni mit ae 
E, = » Amn sin( “™x) sin( "yrs ne) 


m,n=1 


19.27 Consider a circular heat-conducting plate of radius a whose temper- 
ature at time t = 0 has a distribution function f(p, g). Let us find the vari- 
ation of T for all points (o, g) on the plate for time ¢t > 0 when the edge is 
kept at T= 0. 


(a) Show that the two-dimensional heat equation, after separation yields 


the following ODEs: 
ae 2 ey iiaG 
dt oe ’ 2 LU —_ ? 
d*R 1dR 
rs Be RO: 
dp? pdp \p? 


(b) Show that these DEs leads to the following solutions: 


g(t) = Aer, S(g) = Bcosmg + C sinmg, 
R(p) = DJm (bp). 


where A = —b? 


19.8 Problems 


(c) Show that the general solution can be written as 


co CO 
—k2 2 x. 
T(p, g,t)= x Yie k (Xmn/a) ‘In = ») 


m=0n=1 


X (Amn cosmg + Bmn sinm@). 


Amn and By» can be determined as in Sect. 19.6.1. 


19.28 Consider a quantum particle in a cylindrical can. For an atomic par- quantum particle ina 
ticle of mass yz confined in a cylindrical can of length LZ and radius a, the cylindrical can 
relevant Schrodinger equation is 


ow h E °( *) 1 aw a4 


= + 
Or 2u| p dp . dp p? dg? — az? 


subject to the BC that w (9, @, z, t) vanishes at the sides of the can. 


(a) Show that the separation of variables yields 


dT, PZ d’S 


=iw?, —~ +1Z=0, —— + mS =0, 
oo de ie 
(19.49) 
a’R 1dR 2 me pa 
dp>  p dp ne py 


(b) Show that the energy eigenvalues are 


ne (ka\? x2 
Ekmn = hokmn = =I ( L ) + | 


where Xn» is the nth zero of Jj,(x), the Bessel function of order m, 
and k is related to A by A = (kx/L)?. 
(c) Show that the general solution can be written as 


= x kr 
v a aa | i ») sin( =) 


k,n=1 
m=0 


X (Akmn cosm@ + Benn Sinm®@). 


19.29 Find the modes and the corresponding fields of a cylindrical resonant 
cavity of length LZ and radius a. Discuss the lowest TM mode. 


19.30 Two identical long conducting half-cylindrical shells (cross sections 
are half-circles) of radius a are glued together in such a way that they are 
insulated from one another. One half-cylinder is held at potential Vo and 
the other is grounded. Find the potential at any point inside the resulting 
cylinder. Hint: Separate Laplace’s equation in two dimensions. 
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19.31 A linear charge distribution of uniform density 4 extends along the 
z-axis from z = —b to z = b. Show that the electrostatic potential at any 
point r > b is given by 


eo 
(b/r)?k+1 

P(r,0,9) =24) —_—— P»,(cos 6). 
pao 2k +1 


Hint: Consider a point on the z-axis at a distance r > b from the origin. 
Solve the simple problem by integration and compare the result with the 
infinite series to obtain the unknown coefficients. 


19.32 The upper half of a heat-conducting sphere of radius a has tem- 
perature 7p; the lower half is maintained at temperature —7o. The whole 
sphere is inside an infinitely large mass of heat-conducting material. Find 
the steady-state temperature distribution inside and outside the sphere. 


19.33 Find the steady-state temperature distribution inside a sphere of ra- 
dius a when the surface temperature is given by: 


(a) To cos? 6, (b) Tocos*d, (c) To| cos, 


(d) To(cos@ — cos? @), (e) Tpsin?6, (f) Tosin*@. 


19.34 Find the electrostatic potential both inside and outside a conducting 
sphere of radius a when the sphere is maintained at a potential given by 


(a) Vo(cos6@ — 3sin’ 0), 
(b) Vo(Scos* 6 — 3sin7@), 


(c) Vocos@ for the upper hemisphere, 
c 


0 for the lower hemisphere. 


19.35 Find the steady-state temperature distribution inside a solid hemi- 
sphere of radius a if the curved surface is held at 7o and the flat surface 
at T = 0. Hint: Imagine completing the sphere and maintaining the lower 
hemisphere at a temperature such that the overall surface temperature distri- 
bution is an odd function about 6 = 1/2. 


19.36 Find the steady-state temperature distribution in a spherical shell of 
inner radius R; and outer radius R2 when the inner surface has a temperature 
T, and the outer surface a temperature 7>. 


Part VI 
Green’s Functions 


Green’s Functions in One Dimension 20 


Our treatment of differential equations, with the exception of SOLDEs with 
constant coefficients, did not consider inhomogeneous equations. At this 
point, however, we can put into use one of the most elegant pieces of ma- 
chinery in higher mathematics, Green’s functions, to solve inhomogeneous 
differential equations. 

This chapter addresses Green’s functions in one dimension, that is, 
Green’s functions of ordinary differential equations. Consider the ODE 
L,.[u] = f(x) where L, is a linear differential operator. In the abstract Dirac 
notation this can be formally written as L|w) = | f). If L has an inverse 
L-! =G, the solution can be formally written as |v) = L~'| f) = GI f). 
Multiplying this by (x| and inserting 1 = f dy|y)w(y)(y| between G and 


| f) gives 
i= / HOG WEG FOD, (20.1) 


where the integration is over the range of definition of the functions in- 
volved. Once we know G(x, y), Eq. (20.1) gives the solution u(x) in an 
integral form. But how do we find G(x, y)? 

Sandwiching both sides of LG = 1 between (x| and |) and using 


1= [also le" 
between L and G yields 


d(x — y) 
d 'L 7 / w / G M = = 
/ x (x x ) (x ) (x y) (x|y) ree 
if we use Eq. (7.19). In particular, if L is a local differential operator (see 
Sect. 17.1), then L(x, x’) = [6(x — x’)/w(x)]Lx, and we obtain 
d(x — y) 
L,. G(x, y)=——__ or Ly G(x, y) = —y), (20.2) 
w(x) 

where the second equation makes the frequently used assumption that 
w(x) = 1. G(x, y) is called the Green’s function (GF) for the differential 
operator (DO) L,. 
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20 Green's Functions in One Dimension 


As discussed in Chaps. 17 and 19, L, might not be defined for all func- 
tions on R. Moreover, a complete specification of L, requires some initial 
(or boundary) conditions. Therefore, we expect G(x, y) to depend on such 
initial conditions as well. We note that when L, is applied to (20.1), we get 


Lyu(x) = | dy[Le(G(x. »))]wO) FO) 


6(x — y) 
= | dy—— wi) f(y) = fF @), 
w(x) 
indicating that u(x) is indeed a solution of the original ODE. Equa- 
tion (20.2), involving the generalized function 5(x — y) (or distribution in 
the language of Sect. 7.3), is meaningful only in the same context. Thus, we 
treat G(x, y) not as an ordinary function but as a distribution. Finally, (20.1) 
is assumed to hold for an arbitrary (well-behaved) function f. 


20.1 Calculation of Some Green’s Functions 


This section presents some examples of calculating G(x, y) for very sim- 
ple DOs. Later we will see how to obtain Green’s functions for a general 
second-order linear differential operator. Although the complete specifica- 
tion of GFs requires boundary conditions, we shall introduce unspecified 
constants in some of the examples below, and calculate some indefinite GFs. 


Example 20.1.1 Let us find the GF for the simplest DO, L, = d/dx. We 
need to find a distribution such that its derivative is the Dirac delta function:! 
G’(x, y) =6(x — y). In Sect. 7.3, we encountered such a distribution—the 
step function 0(x — y). Thus, 


G(x, y)=O0(x —y) + a(y), 


where a(y) is the “constant” of integration. 


The example above did not include a boundary (or initial) condition. Let 
us see how boundary conditions affect the resulting GF. 


Example 20.1.2 Let us solve u/(x) = f(x) where x € [0, 00) and u(0) = 


0. A general solution of this DE is given by Eq. (20.1) and the preceding 
example: 


ee [ 62 =») fO)dy + i aly) fO)dy. 


The factor 6(x — y) in the first term on the RHS chops off the integral at x: 


i= i fare [ a) FONdy. 


‘Here and elsewhere in this chapter, a prime over a GF indicates differentiation with 
respect to its first argument. 


20.1. Calculation of Some Green's Functions 


The BC gives 


0=u(0)=0+ | ay) f(y. 


The only way that this can be satisfied for arbitrary f(y) is for a(y) to be 
zero. Thus, G(x, y) = 6(x — y), and 


noe [ oe —WFG)dy= [ roads. 


This is killing a fly with a sledgehammer! We could have obtained the 
result by a simple integration. However, the roundabout way outlined here 
illustrates some important features of GFs that will be discussed later. The 
BC introduced here is very special. What happens if it is changed to u(0) = 
a? Problem 20.1 answers that. 


Example 20.1.3 A more complicated DO is L, = d*/dx?. Let us find its 
indefinite GF. To do so, we integrate G’ (x, y) = 5(x — y) once with respect 
to x to obtain 


d 
oo G(x, y) =O0(x —y) + a(y). 
Xx 


A second integration yields 


Gw,y)= f Hie bay aaO): 


where a and 7 are arbitrary functions and the integral is an indefinite integral 
to be evaluated next. 
Let §2(x, y) be the primitive of 6(x — y); that is, 


— =0(x-y)= (20.3) 


dQ 1 ifx>y, 
dx 


O ifx<y. 
The solution to this equation is 


x+a(y) ifx>y, 


2 — 
ay fae leas 


Note that we have not defined £2 (x, y) at x = y. It will become clear below 
that 2(x, y) is continuous at x = y. It is convenient to write 2(x, y) as 


Q(x, y)=[x+a(y) JO — y) +b) 0(y — x). (20.4) 
To specify a(y) and b(y) further, we differentiate (20.4) and compare it 
with (20.3): 


dQ 
in Ox —y) + [x +a(y)]6@ — y) — BO”) 8x — y) 


=6(x — y)+[x—b(y) +.a(y) ]8@ — y), (20.5) 
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where we have used 


aa =5 
x (y—x)=d(x — y). 


f 4 )= 
dx ae 


For Eq. (20.5) to agree with (20.3), we must have [x — b(y) + a(y)]6(x — 
y) = 0, which, upon integration over x, yields a(y) — b(y) = —y. Substitut- 
ing this in the expression for §2(x, y) gives 


Q(x, y) = (« — yO — y) + b&”)[9@ — y) +4(—y — x)]. 


But 6(x) + 0(—x) = 1; therefore, 2(x, y) = (x — y)O(x — y) + bY). It 
follows, among other things, that (2(x, y) is continuous at x = y. We can 
now write 


G(x, y) = (« — y)O(x — y) + xa(y) + BY), 
where B(y) = n(y) + dQ). 


The GF in the example above has two arbitrary functions, a(y) and B(y), 
which are the result of underspecification of L,: A full specification of Ly 
requires BCs, as the following example shows. 


Example 20.1.4 Let us calculate the GF of L,[u] = u” (x) = f(x) subject 
to the BC u(a) = u(b) = 0 where [a, b] is the interval on which L, is de- 


fined. Example 20.1.3 gives us the (indefinite) GF for L,. Using that, we can 
write 


b b 
uc) = [ (= 0x — FO)dy +x f a(y) f(y) dy 


a 


b 
+f Biy) f(y) dy 


Xx b b 
-/ Gy Fody+e f wy fordy+ | BO) FO) dy. 


Applying the BCs yields 
b b 
tue af aG) FO)ay4 / p(y) fy) dy, 


b b 
1S / nats / aty) Fay (20.6) 


a 
b 
+ / Biy) f(y) dy. 
a 
From these two relations it is possible to determine a(y) and B(y): Sub- 


stitute for the last integral on the RHS of the second equation of (20.6) from 
the first equation and get 


b 
0= i [b — y+ bay) —aa(y)] FO) dy. 


20.1 Calculation of Some Green's Functions 


Since this must hold for arbitrary f(y), we conclude that 


b-— 
b-yt(b—aja(y)=0 > ay)=-7—, 


Substituting for a(y) in the first equation of (20.6) and noting that the result 
holds for arbitrary f, we obtain 6(y) = a(b — y)/(b — a). Insertion of a(y) 
and 6(y) in the expression for G(x, y) obtained in Example 20.1.3 gives 


y—b 
b-a 


G(x, vy) =(* — y)O(x — y) + 4% — a) where a < x and y <b. 


It is striking that G(a, y) = (a — y)0(a — y) = 0 (because a — y < 0), 
and 


—b 
Gb, y) = (b= yb — y) + (ba) — =0 


because 0(b — y) = | for all y < b [recall that x and y lie in the interval 
(a, b)]. These two equations reveal the important fact that as a function of 
x, G(x, y) satisfies the same (homogeneous) BC as the solution of the DE. 
This is a general property that will be discussed later. 


In all the preceding examples, the BCs were very simple. Specifically, the 
value of the solution and/or its derivative at the boundary points was zero. 
What if the BCs are not so simple? In particular, how can we handle a case 
where u(a) [or u’(a)] and u(b) [or u’(b)] are nonzero? 

Consider a general (second-order) differential operator L,. and the differ- 
ential equation L,[u] = f(x) subject to the BCs u(a) = a; and u(b) = by. 
We claim that we can reduce this system to the case where u(a) = u(b) = 0. 
Recall from Chap. 14 that the most general solution to such a DE is of the 
form u = up + uj; where up, the solution to the homogeneous equation, sat- 
isfies L, [uy] = 0 and contains the arbitrary parameters inherent in solutions 
of differential equations. For instance, if the linearly independent solutions 
are v and w, then up,(x) = Cy v(x) + Cow(x) and u; is any solution of the 
inhomogeneous DE. 

If we demand that u;,(a) = a, and u;,(b) = b1, then u; satisfies the sys- 
tem 


Ly [ui] = f(x), uj (a) = uj(b) =9, 


which is of the type discussed in the preceding examples. Since L, is a 
SOLDO, we can put all the machinery of Chap. 14 to work to obtain v(x), 
w(x), and therefore uj (x). The problem then reduces to a DE for which the 
BCs are homogeneous; that is, the value of the solution and/or its derivative 
is zero at the boundary points. 


Example 20.1.5 Let us assume that L, = vax, Calculation of uy, is triv- 
ial: 
dup 


Ae =0 => up(x)=Cix+Co. 


L,[u,]=0 => 
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To evaluate C; and C2, we impose the BCs up (a) = ay and up(b) = by: 


Cja+C2=a), 
Cib+C,=b). 
This gives Cy = (bj — a,)/(b — a) and C2 = (ajb — ab) /(b — a). 
The inhomogeneous equation defines a problem identical to that of Ex- 


ample 20.1.4. Thus, we can immediately write uj; (x) = fie G(x, y) f(y) dy, 
where G(x, y) is as given in that example. Thus, the general solution is 


fot == b x 
ss 2 t+ f (@—NFO)dy 


b-a b-—a 

X—a b 

+o" | (¥— DFO) dy. 
=i. 


Example 20.1.5 shows that an inhomogeneous DE with inhomogeneous 
BCs can be separated into two DEs, one homogeneous with inhomogeneous 
BCs and the other inhomogeneous with homogeneous BCs, the latter being 
appropriate for the GF. Furthermore, all the preceding examples indicate that 
solutions of DEs can be succinctly written in terms of GFs that automatically 
incorporate the BCs as long as the BCs are homogeneous. Can a GF also 
give the solution to a homogeneous DE with inhomogeneous BCs? 


20.2 Formal Considerations 


The discussion and examples of the preceding section hint at the power of 
Green’s functions. The elegance of such a function becomes apparent from 
the realization that it contains all the information about the solutions of a 
DE for any type of BCs, as we are about to show. Since GFs are inverses 
of DOs, let us briefly reexamine the inverse of an operator, which is closely 
tied to its spectrum. 

The question as to whether or not an operator A in a finite-dimensional 
vector space is invertible is succinctly answered by the value of its determi- 
nant: A is invertible if and only if det A 0. In fact, as we saw at the begin- 
ning of Chap. 17, one translates the abstract operator equation A|u) = |v) 
into a matrix equation Au = v and reduces the question to that of the in- 
verse of a matrix. This matrix takes on an especially simple form when A is 
diagonal, that is, when A;; = A;4;;. For this special situation we have 


Aju;j=v; fori=1,2,...,N (no sum over /). (20.7) 


This equation has a unique solution (for arbitrary v;) if and only if A; 4 0 
for all 7. In that case uj = v;/A; fori =1,2,..., N. In particular, if v; = 0 
for all 7, that is, when Eq. (20.7) is homogeneous, the unique solution is 
the trivial solution. On the other hand, when some of the A; are zero, there 
may be no solution to (20.7), but the homogeneous equation has a nontrivial 
solution (u; need not be zero). Proposition 6.2.6 applies to vector spaces of 
finite as well as infinite dimensions. Therefore, we restate it here: 


20.2 Formal Considerations 


Theorem 20.2.1 An operator A on a Hilbert space has an inverse if and 
only if = 0 is not an eigenvalue of A. Equivalently, A is invertible if and 
only if the homogeneous equation A|u) = 0 has no nontrivial solutions. 


Green’s functions are inverses of differential operators. Therefore, it is 
important to have a clear understanding of the DOs. An nth-order linear 
differential operator (NOLDO) satisfies the following theorem (for a proof, 
see [Birk 78, Chap. 6]). 


Theorem 20.2.2 Let 


qd” q'-} d 
L, = Pa) oe a Pet) I arg eae Pin). + po(x) (20.8) 
where pn(x) #0 in [a, b]. Let xo € [a, b] and let {yy }7_, be given numbers 
and f (x) a given piecewise continuous function on [a, b]. Then the initial 
value problem (IVP) 


L.{ujJ= f forx €[a, bd], 
(20.9) 
u(xo)=v1, —-u"(X0) = y2,---, UY x0) = Yn 


has one and only one solution. 


This is simply the existence and uniqueness theorem for a NOLDE. 
Equation (20.9) is referred to as the IVP with data { f(x); 1, ..-, Yn}. This 
theorem is used to define L,. Part of that definition are the BCs that the 
solutions to Ly must satisfy. 

A particularly important BC is the homogeneous one in which y) = y2 = 
+++ = Y, = 0. In such a case it can be shown (see Problem 20.3) that the 
only nontrivial solution of the homogeneous DE L,[u] = 0 is u = 0. Theo- 
rem 20.2.1 then tells us that L, is invertible; that is, there is a unique operator 
G such that LG = 1. The “components” version of this last relation is part of 
the content of the next theorem. 


Theorem 20.2.3 The DO L, of Eq. (20.8) associated with the IVP 
with data { f (x); 0,0, ..., 0} is invertible; that is, there exists a func- 
tion G(x, y) such that 


d(x — y) 


L,.G(x, y) = we) 


The importance of homogeneous BCs can now be appreciated. Theo- 
rem 20.2.3 is the reason why we had to impose homogeneous BCs to obtain 
the GF in all the examples of the previous section. 
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The BCs in (20.9) clearly are not the only ones that can be used. The 
most general linear BCs encountered in differential operator theory are 


n 


R;[u] = YS \(aijul-P@) + BijuI~(b))=y;, i=1,...,n. (20.10) 
j=l 


The n row vectors {(aj1,..-, Qin, Bi1,---, Bin)}/_, are assumed to be inde- 
pendent (in particular, no row is identical to zero). We refer to R; as bound- 
ary functionals because for each (sufficiently smooth) function u, they give 
a number y;. The DO of (20.8) and the BCs of (20.10) together form a 
boundary value problem (BVP). The DE L,[u] = f subject to the BCs of 
(20.10) is a BVP with data { f(x); v1,.--, Yn}. 

We note that the R; are linear; that is, 


Ri [uy +42] =R;[u1]+R;[v2] and R,;lau] =aR;[u]. 


Since L, is also linear, we conclude that the superposition principle ap- 
plies to the system consisting of L,[u] = f and the BCs of (20.10), which 
is sometimes denoted by (L; Rj,...,R,). If u satisfies the BVP with data 
{f3Vi,---, Yn} and v satisfies the BVP with data {g; 1,..., Un}, then 
au -+ fv satisfies the BVP with data {af + Bg; ay; +Buy4,...,@%,+ Bin}. 
It follows that if u and v both satisfy the BVP with data {f; 1,..., yn}, then 
u — v Satisfies the BVP with data {0; 0,0,...,0}, which is called the com- 
pletely homogeneous problem. 

Unlike the IVP, the BVP with data {0; 0, 0, ...,0} may have a nontrivial 
solution. If the completely homogeneous problem has no nontrivial solution, 
then the BVP with data { f; 71,..., ¥,} has at most one solution (a solution 
exists for any set of data). On the other hand, if the completely homogeneous 
problem has nontrivial solutions, then the BVP with data {f;1,..., Yn} 
either has no solutions or has more than one solution (see [Stak 79, pp. 203- 
204]). 

Recall that when a differential (unbounded) operator L, acts in a Hilbert 
space, such as i fa. b), it acts only on its domain. In the context of the 
present discussion, this means that not all functions in Le (a, b) satisfy the 
BCs necessary for defining L,. Thus, the functions for which the operator 
is defined (those that satisfy the BCs) form a subset of Ge ta, b), which we 
called the domain of L, and denoted by D(L,). From a formal standpoint it 
is important to distinguish among maps that have different domains. For in- 
stance, the Hilbert-Schmidt integral operators, which are defined on a finite 
interval, are compact, while those defined on the entire real line are not. 


Definition 20.2.4 Let L, be the DO of Eq. (20.8). Suppose there exists a 
DO Li, with the property that 


d i 
w{v*(Lx[u]) — u(Lifv])"} = 7, Olu") foru,ve DbxyND(Ly), 
x 
where Q[u, v*], called the conjunct of the functions u and v, depends on 


u, v, and their derivatives of order up to n — 1. The DO L{ is then called 
the formal adjoint of L,. If Li = L, (without regard to the BCs imposed 


20.2 Formal Considerations 


on their solutions), then L, is said to be formally self-adjoint. If DL!) >) 
D(L,) and Li =L, on D(L,), then L, is said to be hermitian. If D(L') = 
D(L,) and Li =L,, then L, is said to be self-adjoint. 


The relation given in the definition above involving the conjunct is a gen- 
eralization of the Lagrange identity and can also be written in integral form: 


b b 
i dxw{v*(Ly[u])} -| dxw{u(Litv])*} =Q[u, v*]|? (20.11) 


This form is sometimes called the generalized Green’s identity. 


Historical Notes 

George Green (1793?-1841) was not appreciated in his lifetime. His date of birth is 
unknown (however, it is known that he was baptized on 14 July 1793), and no portrait 
of him survives. He left school, after only one year’s attendance, to work in his father’s 
bakery. When the father opened a windmill in Nottingham, the boy used an upper room as 
a study in which he taught himself physics and mathematics from library books. In 1828, 
when he was thirty-five years old, he published his most important work, An Essay on the 
Application of Mathematical Analysis to the Theory of Electricity and Magnetism at his 
own expense. In it Green apologized for any shortcomings in the paper due to his minimal 
formal education or the limited resources available to him, the latter being apparent in the 
few previous works he cited. The introduction explained the importance Green placed on 
the “potential” function. The body of the paper generalizes this idea to electricity and 
magnetism. 

In addition to the physics of electricity and magnetism, Green’s first paper also contained 
the monumental mathematical contributions for which he is now famous: The relation- 
ship between surface and volume integrals we now call Green’s theorem, and the Green’s 
function, a ubiquitous solution to partial differential equations in almost every area of 
physics. With little appreciation for the future impact of this work, one of Green’s con- 
temporaries declared the publication “a complete failure”. The “Essay”, which received 
little notice because of poor circulation, was saved by Lord Kelvin, who tracked it down 
in a German journal. 

When his father died in 1829, some of George’s friends urged him to seek a college edu- 
cation. After four years of self-study, during which he closed the gaps in his elementary 
education, Green was admitted to Caius College of Cambridge University at the age of 
40, from which he graduated four years later after a disappointing performance on his fi- 
nal examinations. Later, however, he was appointed Perce Fellow of Caius College. Two 
years after his appointment he died, and his famous 1828 paper was republished, this 
time reaching a much wider audience. This paper has been described as “the beginning 
of mathematical physics in England”. 

He published only ten mathematical works. In 1833 he wrote three further papers. Two on 
electricity were published by the Cambridge Philosophical Society. One on hydrodynam- 
ics was published by the Royal Society of Edinburgh (of which he was a Fellow) in 1836. 
He also had two papers on hydrodynamics (in particular wave motion in canals), two pa- 
pers on reflection and refraction of light, and two papers on reflection and refraction of 
sound published in Cambridge. 

In 1923 the Green windmill was partially restored by a local businessman as a gesture of 
tribute to Green. Einstein came to pay homage. Then a fire in 1947 destroyed the reno- 
vations. Thirty years later the idea of a memorial was once again mooted, and sufficient 
money was raised to purchase the mill and present it to the sympathetic Nottingham City 
Council. In 1980 the George Green Memorial Appeal was launched to secure £20,000 to 
get the sails turning again and the machinery working once more. Today, Green’s restored 
mill stands as a mathematics museum in Nottingham. 
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20 Green's Functions in One Dimension 
20.2.1 Second-Order Linear DOs 


Since second-order linear differential operators (SOLDOs) are sufficiently 
general for most physical applications, we will concentrate on them. Be- 
cause homogeneous BCs are important in constructing Green’s functions, 
let us first consider BCs of the form 


Ri [u] = a 1,u(a) + aj2u' (a) + Bi u(b) + Bi2u'(b) = 0, 


(20.12) 
Ro[u] = a2, u(a) + a2Qu' (a) + B1u(b) + B22u'(b) = 0, 


where it is assumed, as usual, that (@11, @12, B11, B12) and (@21, @22, B21, B22) 
are linearly independent. 

If we define the inner product as an integral with weight w, Eq. (20.11) 
can be formally written as 


b 
a 


(v|L|u) = (u|L"|v)* + O[u, v*]| 


This would coincide with the usual definition of the adjoint if the surface 
term vanishes, that is, if 


Ou. o Ven = Ole 0", (20.13) 


For this to happen, we need to impose BCs on v. To find these BCs, let us 
rewrite Eq. (20.12) ina more compact form. Linear independence of the two 
row vectors of coefficients implies that the 2 x 4 matrix of coefficients has 
rank two. This means that the 2 x 4 matrix has an invertible 2 x 2 submatrix. 
By rearranging the terms in Eq. (20.12) if necessary, we can assume that the 
second of the two 2 x 2 submatrices is invertible. The homogeneous BCs 
can then be conveniently written as 


— (Rilv]) _ Ua) _ - 
R[u] = i) = (A: ‘B) (‘) = Aug + Bup = 0, (20.14) 


where 
_—(%1 m2 _ (fu Br _ (4¥@ 
= fe ) , = (fn ) ; ae ee , 
_ (ud) 
Up = be ’ 


and B is invertible. 
The most general form of the conjunct for a SOLDO is 


Olu, v* }() = qui @)u(a)v* (x) + qin eux) (x) 
+ gai (x)ul (x)v* (x) + qr2(x)ul (x)u™ (x), 


which can be written in matrix form as 


fs it ot _ (ai@) qi2(s) 
Q[u, v*|(x) =ul.Q,vt where a ). (20.15) 


20.2 Formal Considerations 


and u, and v* have similar definitions as u, and up above. The vanishing of 
the surface term becomes 


ul Qnvi = ul Qu‘. (20.16) 


We need to translate this equation into a condition on v* alone.” This is 
accomplished by solving for two of the four quantities u(a), u’(a), u(b), 
and u’(b) in terms of the other two, substituting the result in Eq. (20.16), 
and setting the coefficients of the other two equal to zero. Let us assume, as 
before, that the submatrix B is invertible, i.e., u(b) and u’(b) are expressible 
in terms of u(a) and w’(a). Then uy = —B™! Aug, or ee —u! A‘(B’)~!, and 
we obtain 


—u' A'(B') 'Qyvg =ulQuvs = ut [A‘(B’) 'Qnve + Quv*] =0, 
and the condition on v* becomes 
A'(B') | Quvf + Quv =0. (20.17) 


We see that all factors of u have disappeared, as they should. The expanded 


version of the BCs on v* are written as 
Bi[v*] So v*(a) + o20* (a) + miv*(b) + mi2v*(b) = 0, es 
Bo[v*] = 0210" (a) + ox." (a) + nov" (b) + n22v™(b) = 0. 


These homogeneous BCs are said to be adjoint to those of (20.12). Because adjoint boundary 
of the difference between BCs and their adjoints, the domain of a differential conditions 
operator need not be the same as that of its adjoint. 


Example 20.2.5 Let L, =d?/dx7 with the homogeneous BCs 
Ry[u] =au(a)—u'(a)=0 and R,[u] = Bu(b) —u'(b) =0. (20.19) 


We want to calculate Q[u, v*] and the adjoint BCs for v. By repeated inte- 
gration by parts [or by using Eq. (14.22)], we obtain Q[u, v*] = u/v* —uv™. 
For the surface term to vanish, we must have 


u'(a)vu* (a) — u(a)v™ (a) = u' (b)v* (b) — u(b)u"* (db). 
Substituting from (20.19) in this equation, we get 
u(a)[av* (a) — v"*(a)] = u(b)[Bv*b) — v*@)], 
which holds for arbitrary u if and only if 


B,[v*]=av*(a)—v*(a)=0 and By[v*] = Bu*(b) —v"*(b) =0. 
(20.20) 
This is a special case, in which the adjoint boundary conditions are the same 
as the original BCs (substitute u for v* to see this). 


?The boundary conditions on v* should not depend on the choice of u. 
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20 Green's Functions in One Dimension 


To see that the original BCs and their adjoints need not be the same, we 
consider 


Ri [vu] =u'(a)—au(b)=0 and Ro[u] = Bu(a) —u'(b)=0, (20.21) 


from which we obtain u(a)[Bv*(b) + v*(a)] = u(b)[av*(a) + v*(b)]. 
Thus, 


B,[v"] =av*(a)+v%(b)=0 and By[v*] = Bo*(b) +" (a) =0, 
(20.22) 
which is not the same as (20.21). Boundary conditions such as those in 
(20.19) and (20.20), in which each equation contains the function and its 
derivative evaluated at the same point, are called unmixed BCs. On the 
other hand, (20.21) and (20.22) are mixed BCs. 


20.2.2 Self-adjoint SOLDOs 


In Chap. 14, we showed that a SOLDO satisfies the generalized Green’s 
identity with w(x) = 1. In fact, since wu and v are real, Eq. (14.23) is identical 
to (20.11) if we set w = 1 and 


Olu, v] = povu’ — (pov)'u+ piu. (20.23) 


Also, we have seen that any SOLDO can be made (formally) self-adjoint. 
Thus, let us consider the formally self-adjoint SOLDO 


where both p(x) and q(x) are real functions and the inner product is defined 
with weight w = 1. If we are interested in formally self-adjoint operators 
with respect to a general weight w > 0, we can construct them as follows. 
We first note that if L, is formally self-adjoint with respect to a weight of 
unity, then (1/w)L, is self-adjoint with respect to weight w. Next, we note 
that L,. is formally self-adjoint for all functions gq, in particular, for wg. Now 
we define 


and note that L@) is formally self-adjoint with respect to a weight of unity, 
and therefore 


1 ld d 
L, = —L@ = 20.24 
ne ie wala) +4 ( ) 
is formally self-adjoint with respect to weight w(x) > 0. 
For SOLDOs that are formally self-adjoint with respect to weight w, the 
conjunct given in (20.23) becomes 


Q[u, v] = p(x)w(x)(vu' _ uv’). (20.25) 
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Thus, the surface term in the generalized Green’s identity vanishes if and 
only if 


p(b)w(b)[v(b)u' (b) — u(b)v'(b)] 
= p(a)w(a)[v(a)u'(a) — u(a)v'(a)]. (20.26) 


The DO becomes self-adjoint if uw and v satisfy Eq. (20.26) as well as the 
same BCs. It can easily be shown that the following four types of BCs on 
u(x) assure the validity of Eq. (20.26) and therefore define a self-adjoint 
operator L, given by (20.24): 


The Dirichlet BCs: u(a) = u(b) = 0 

The Neumann BCs: u’(a) = u’(b) =0 

General unmixed BCs: au(a) — u'(a) = Bu(b) — u'(b) =0 
Periodic BCs: u(a) = u(b) and u'(a) = u'(b) 


feel ee ee 


20.3 Green’s Functions for SOLDOs 


We are now in a position to find the Green’s function for a SOLDO. First, 
note that a complete specification of L, requires not only knowledge of 
Po(x), pi(x), and p2(x)—its coefficient functions—but also knowledge of 
the BCs imposed on the solutions. The most general BCs for a SOLDO are 
of the type given in Eq. (20.10) with n = 2. Thus, to specify L, uniquely, we 
consider the system (L; R;, Ro) with data (f; y;, v2). This system defines a 
unique BVP: 


2 


Lip pe" oe 
clu] = pax) 5 + Pil) + polx)u = £0), oe 


RilujJ=yvi, i=1,2. 


A necessary condition for L, to be invertible is that the homogeneous DE 
L,.[u] = 0 have only the trivial solution u = 0. For u = 0 to be the only solu- 
tion, it must be a solution. This means that it must meet all the conditions in 
Eq. (20.27). In particular, since R; are linear functionals of u, we must have 
R;[0] = 0. This can be stated as follows: 


Lemma 20.3.1 A necessary condition for a second-order linear DO to be 
invertible is for its associated BCs to be homogeneous.* 


Thus, to study Green’s functions we must restrict ourselves to problems 
with homogeneous BCs. This at first may seem restrictive, since not all prob- 
lems have homogeneous BCs. Can we solve the others by the Green’s func- 
tion method? The answer is yes, as will be shown later in this chapter. 

The above discussion clearly indicates that the Green’s function of L,, 
being its “inverse”, is defined only if we consider the system (L; Rj, Ro) 
with data (f; 0,0). If the Green’s function exists, it must satisfy the DE of 


3The lemma applies to all linear DOs, not just second order ones. 
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Theorem 20.2.3, in which L, acts on G(x, y). But part of the definition of 
L, are the BCs imposed on the solutions. Thus, if the LHS of the DE is to 
make any sense, G(x, y) must also satisfy those same BCs. We therefore 
make the following definition: 


Definition 20.3.2 The Green’s function of a DO L, is a function 
G(x, y) that satisfies both the DE 


d(x — y) 


L, G(x, y)= Tien 


and, as a function of x, the homogeneous BCs R;[G] = 0 for i = 1, 2 
where the R; are defined as in Eq. (20.12). 


It is convenient to study the Green’s function for the adjoint of L,. simul- 
taneously. Denoting this by g(x, y), we have 


d(x — y) 


ae Bi[g]=0, fori =1,2, (20.28) 
w(x) 


Liga. y) = 
where B; are the boundary functionals adjoint to R; and given in Eq. (20.18). 
The function g(x, y) is known as the adjoint Green’s function associated 
with the DE of (20.27). 
We can now use (20.27) and (20.28) to find the solutions to 


L.[u] = f(@), R;[u]J =O fori=1,2, 
(20.29) 
L'[v] = h(x), Bj[v*]=0 fori=1,2. 
With v(x) = g(x, y) in Eq. (20.11)—whose RHS is assumed to be zero— 
we get hi we*(x, y)ky[u]dx = ( wu(x)(Li[g])*dx. Using (20.28) on the 
RHS and (20.29) on the LHS, we obtain 


b 
uo) = | g(x, ywix) f (x) dx. 


Similarly, with u(x) = G(x, y), Eq. (20.11) gives 


b 
v*G) = / G(x, y)w(x)h* (x) dx, 


or, since w(x) is a (positive) real function, 


b 
i= / G* (x, yyw(x)h(x) dx. 


These equations for u(y) and v(y) are not what we expect [see, for in- 
stance, Eq. (20.1)]. However, if we take into account certain properties of 
Green’s functions that we will discuss next, these equations become plausi- 
ble. 


20.3 Green’s Functions for SOLDOs 
20.3.1 Properties of Green’s Functions 


Let us rewrite the generalized Green’s identity [Eq. (20.11)], with the RHS 
equal to zero, as 


b b 
[ awo toot) = dtw(t){u(t)(L}[vl)*}. (20.30) 


This is sometimes called Green’s identity. Substituting G(t, y) for u(t) and Green’s identity 
g(t, x) for v(t) gives 


’ 


dt-y) f? 5(t — x) 
me =| dtw(t)G(t, y) 20 


or g*(y, x) = G(x, y). A consequence of this identity is 


b 
| dtw(t)g*(t, x) 


Proposition 20.3.3 G(x, y) must satisfy the adjoint boundary conditions 
with respect to its second argument. 


If for the time being we assume that the Green’s function associated 
with a system (L;R,,R2) is unique, then, since for a self-adjoint differ- 
ential operator, L, and Li are identical and u and v both satisfy the same 
BCs, we must have G(x, y) = g(x, y) or, using g*(y, x) = G(x, y), we get 
G(x, y) = G*(y, x). In particular, if the coefficient functions of L, are all 
real, G(x, y) will be real, and we have G(x, y) = G(y, x). We thus have 


Proposition 20.3.4 The Green’s function is a symmetric function of 
its two arguments: G(x, y) = G(y, x). 


The last property is related to the continuity of G(x, y) and its derivative 
at x = y. For a SOLDO, we have 


6(x—y) 


0G aG 
L, G(x, y) = po) + pi) + po)G = ——, 
Ox Ox w(x) 


where po, P1, and p2 are assumed to be real and continuous in the interval 
[a, b], and w(x) and p2(x) are assumed to be positive for all x € [a, b]. We 
multiply both sides of the DE by 


_ M(x) 


h canta 
ars 


, Where p(x)= exo] | Pil) ar| 


prt) 
noting that du/dx = (pi/p2). This transforms the DE into 


a 
ox 


Po(x) M(x) — KY) 


) 
[mong Gos, »| + ——— Gx, y) d(x — y). 


p2(x) ~ po(y)wy) 
Integrating this equation gives 


* po(t) u(t) KY) 


G(t, y)dt = O(x — 
(t, y) dt wo) (x —y) +a(y) 


(20.31) 


) 
Hl) Oo, y) +f ~ po) 
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because the primitive of (x — y) is 0(x — y). Here a(y) is the “constant” 
of integration. First consider the case where po = 0, for which the Green’s 
function will be denoted by Go(x, y). Then Eq. (20.31) becomes 


Cx) —-Golx, y) = HO g(x — y) +0100) 

* p2(y)w(y) 

which (since 4, p2, and w are continuous on [a, b], and 6(x — y) has a dis- 
continuity only at x = y) indicates that 0Go/dx is continuous everywhere 
on [a, b] except at x = y. Now divide the last equation by p and integrate 
the result to get 


u(y) *O(t—y) - dt 
Go(x,y)= d — . 
ee OO. a 


Every term on the RHS is continuous except possibly the integral involving 
the 6-function. However, that integral can be written as 


*@(t—y) * dt 
dt =@0(x — ——— 20.32 
[ ae Oy ae Oe 


The @-function in front of the integral is needed to ensure that a < y < x as 
demanded by the LHS of Eq. (20.32). The RHS of Eq. (20.32) is continuous 
at x = y with limit being zero as x > y. 

Next, we write G(x, y) = Go(x, y)+ H(x, y), and apply L, to both sides. 
This gives 


( y) =| p2 +p Go + poGo +L, A(x, y) 
d(x x, 
w(x) dx? ‘dx 


8x) 


= ——— + poGo +L, A(x, y), 
w(x) 


or p2H" + p,H’ + poH = —poGo. The continuity of Go, po, pi, and po 
on [a, b] implies the continuity of H, because a discontinuity in H would 
entail a delta function discontinuity in dH /dx, which is impossible because 
there are no delta functions in the equation for H. Since both Go and H are 
continuous, G must also be continuous on [a, b]. 

We can now calculate the jump in 0G/dx at x = y. We denote the jump 
as AG'(y) and define it as follows: 

x= ” 


Dividing (20.31) by z(x) and taking the above limit for all terms, we obtain 


dG 
= (x, y) 


aG 
AG'(y) = lim} —(, 
(y) tim] an (x,y) a5 


x=yt+e 


' : 1 YF€ po(t) u(t) 
A lim | ———— eh lS d 
- 09+ in| Ss | p2(t) ON yet 


1 Y~€ no(t)u(t) 
G(t, y)d 
L(y —€) [ p2(t) eae 
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aren pas 
__ #O) in O(+e€) 6(—€) 
po(y)w(y) e>0Lu(yt+e) uwiy-o] 


The second term on the LHS is zero because all functions are continuous at 
y. The limit on the RHS is simply 1/j(y). We therefore obtain 


AG'(y) = (20.33) 


1 
p2(y)w(y) 


20.3.2 Construction and Uniqueness of Green’s Functions 


We are now in a position to calculate the Green’s function for a general 
SOLDO and show that it is unique. 


Theorem 20.3.5 Consider the system (L; Ry, Ro) with data (f; 0,0), 
in which L, is a SOLDO. If the homogeneous DE L,[u] = 0 has 
no nontrivial solution, then the GF associated with the given sys- 
nae. exists and is unique. The solution of the system is u(x) = 
ie dyw(y)G(x, y) fQ) and is also unique. 


Proof The GF satisfies the DE L,G(x, y) = 0 for all x € [a,b] except 
x = y. We thus divide [a, b] into two intervals, /; = [a, y) and Jp = (y, b], 
and note that a general solution to the above homogeneous DE can be writ- 
ten as a linear combination of a basis of solutions, wu; and u2. Thus, we can 
write the solution of the DE as 


Gi(x, y)=ciuy(x) + cqu2a(x) + forxe 
G(x, y) =dyuj(x) + dou2(x) forxeh 
and define the GF as 


CO BRET. 
eRe ae eee aaa (20.34) 
Gr(x,y) ifxeh, 


where c1, C2, d;, and d) are, in general, functions of y. To determine G(x, y) 
we must determine four unknowns. We also have four relations: the continu- 
ity of G, the jump in 0G/dx at x = y, and the two BCs R;[G] = Ro[G] = 0. 
The continuity of G gives 


ci(y)ui(y) + c2(y)u2(y) = di (y)ui(y) + do(y)u2(y). 


The jump of 0G/dx at x = y yields 


cry) (y) + €2(y)ug(y) = di y)ui (y) — da(y)ug) = ~ pr(y)ywy) 
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Introducing by = c, — d, and bz = cz — dp changes the two preceding equa- 
tions to 


bu, + bou2 = 0,7 


1 
bli ipo 
P2w 


These equations have a unique solution iff 


uy U2 
det (" >) #0. 
But the determinant is simply the Wronskian of the two independent solu- 
tions and therefore cannot be zero. Thus, b; (y) and b2(y) are determined in 


terms of u1, uu), U2, U5, p2, and w. 
We now define 


h(x, y= 


bi (yur (x) + bo(y)ua(x)  ifx eh, 
ifxelh. 


so that G(x, y) =h(x, y) + di(y)ui (x) + do(y)u2(x). We have reduced the 
number of unknowns to two, d; and dz. Imposing the BCs gives two more 
relations: 


Ri [G] = Ry [A] + d) Ry [uy] + d2Ri[u2] =0, 
Ro[G] = Ro[h] + di) Ro[w1] + d2Re[u2] = 0. 
Can we solve these equations and determine d, and d2 uniquely? We can, if 
Ri[u Ri[u 
se (Rta lel) *° 
It can be shown that this determinant is nonzero (see Problem 20.5). 
Having found the unique {b;, hee ,» we can calculate c; uniquely, sub- 


stitute all of them in Eq. (20.34), and obtain the unique G(x, y). That u(x) 
is also unique can be shown similarly. 


Example 20.3.6 Let us calculate the GF for L, = d*/dx* with BCs u(a) = 
u(b) = 0. We note that L,[u] = 0 with the given BCs has no nontrivial so- 
lution (verify this). Thus, the GF exists. The DE for G(x, y) is G” =0 for 
x # y, whose solutions are 


is . 
(oe (20.35) 
dqx+d, ify<x<b. 


Continuity at x = y gives cpy +c2 =d, y+ d2 or by + bp =O with bj = 
c; — d;. The discontinuity of dG/dx at x = y gives 


1 
dqj-—cy=— =1 BS J=-!1 
Pp2Ww 
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assuming that w = 1. From the equations above we also get b7 = y. G(x, y) 
must also satisfy the given BCs. Thus, G(a, y) =0 = G(b, y). Since a < y 
and b > y, we obtain cja + cz = 0 and d,b + d2 = 0, or, after substituting 
ci = bj + di, 


ad\|+d,=a~-y, bd, + d2=0. 


The solution to these equations is dj = (y — a)/(b — a) and dz = —b(y — 
a)/(b — a). With b;, bz, d,, and d2 as given above, we find 


b— b— 
c= bi +d) =- and a as y 


Writing Eq. (20.35) as 
G(x, y) = (c1x + c2)O(y — x) + (dix + d2)O(x — y) 
and using the identity 6(y — x) = 1— 6(x — y), we get 
G(x, y) =c1x +2 — (dix + b2)O(x — y). 


Using the values found for the b’s and c’s, we obtain 


mae | 
G(x, y)=(a o(? = *) + (x — y)@—y), 
which is the same as the GF obtained in Example 20.1.4. 


Example 20.3.7 Let us find the GF for L, = d*/dx* + 1 with the BCs 
u(O) = u(r /2) = 0. The general solution of L,[u] = 0 is 


u(x) = Asinx + Bcosx. 


If the BCs are imposed, we get u = 0. Thus, G(x, y) exists. The general 
form of G(x, y) is 


sin Cos if0<x<y, 
CEs Eeene (20.36) 
d\sinx+d2cosx ify<x<a7/2. 


Continuity of G at x = y gives bj siny + bo cos y = 0 with bj = cj — dj. 
The discontinuity of the derivative of G at x = y gives bj cos y — b2 siny = 
—1, where we have set w(x) = 1. Solving these equations yields bj = 
—cos y and b2 = sin y. The BCs give 


GO,y)=0 > @=0 > d=-b.=-sIiny, 


G(7r/2,y)=0 => dq =0 > cy = —b, =—cosy. 
Substituting in Eq. (20.36) gives 


—cosysinx ifx<y, 
G(x, y= : je 
—sinycosx ify <x, 
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or, using the theta function, 
G(x, y) = —0(y — x) cos ysinx — 0(y — x) sin ycosx 
= -[1 —O(x- y)] cos y sinx — 0(x — y) sin ycosx 
= —cos ysinx + O(x — y)sin(x — y). 


It is instructive to verify directly that G(x, y) satisfies L,[G] = d(x — y): 


2 2 
L.[G] = —cos y (S a i) sins + (55 + 1) [60s — y)sin(x — y)] 
—_—_ 
=0 
a 
= preala — y) sin(x — y)] +0 — y)sin(@& — y) 


d 
= 9g CSG DH =D) 6A 9) 


=0 
+ O(x — y)sin(x — y). 


The first term vanishes because the sine vanishes at the only point where the 
delta function is nonzero. Thus, we have 


L.[G] = [5(x — y)cos(x — y) — 0(x — y) sin(x — y)] 
+ 6(x — y)sin(x — y) 
= d(x — y) 


because the delta function demands that x = y, for which cos(x — y) = 1. 


The existence and uniqueness of the Green’s function G(x, y) in con- 
junction with its properties and its adjoint, imply the existence and unique- 
ness of the adjoint Green’s function g(x, y). Using this fact, we can show 
that the condition for the absence of a nontrivial solution for L,[u] = 0 is 
also a necessary condition for the existence of G(x, y). That is, if G(x, y) 
exists, then L,[u] = 0 implies that u = 0. Suppose G(x, y) exists; then 
g(x, y) also exists. In Green’s identity let v = g(x, y). This gives an iden- 
tity: 


b b 
[ wens’. (testa) ax = | w(x)u(x)(LE[g])* dx 


b 
B(x — 
= i" aCe = dx =u(y). 
a w(x) 
In particular, if L,[u] = 0, then u(y) = 0 for all y. We have proved the 
following result. 


Proposition 20.3.8 The DE L,[u] =0 implies that u = 0 if and only 
if the GF corresponding to Ly, and the homogeneous BCs exist. 


20.3. Green's Functions for SOLDOs 


It is sometimes stated that the Green’s function of a SOLDO with con- 
stant coefficients depends on the difference x — y. This statement is moti- 
vated by the observation that if u(x) is a solution of 


2, 


u du 
+ a,— + agu= f(x), 


L,[u] = a. — 
xl] = ay dx? dx 


then u(x — y) is the solution of agu” + ayu’ + agu = f (x — y) if ao, a1, and 
ay are constant. Thus, if G(x) is a solution of L,[G] = 6(x) [again assuming 
that w(x) = 1], then it seems that the solution of L,[G] = 6(x — y) is sim- 
ply G(x — y). This is clearly wrong, as Examples 20.3.6 and 20.3.7 showed. 
The reason is, of course, the BCs. The fact that G(x — y) satisfies the right 
DE does not guarantee that it also satisfies the right BCs. The following ex- 
ample, however, shows that the conjecture is true for a homogeneous initial 
value problem. 


Example 20.3.9 The most general form for the GF is 


ee 
C6.j\= cee ifa<x<y, 


djui(x)+dou2(x) ify<x <b. 
The IVP condition G(a, y) = 0 = G’(a, y) implies 
cyuy (a) + c2u2(a)=0 and cyu (a) + C2U5 (a) =0. 
Linear independence of uw; and u2 implies 


Aidt fe u2(a) 


ui(a) us(a) 


) = W(a; u1,u2) £0. 
Hence, cj = c2 = 0 is the only solution. This gives 


0 ifa<x<y, 
G(x, y)= ; (20.37) 
dyu,(x)+dou2(x) ify<x<b. 


Continuity of G at x = y yields dju1(y) + d2u2(y) = 0, while the disconti- 
nuity jump condition in the derivative gives d,u' (y) +d2u,(y) = 1. Solving 
these two equations, we get 


= u2(y) Be u(y) 
u' (y)u2(y) — uh (yur (y)) us (y)u2(y) — uh (yu (9) 


d, 


Substituting this in (20.37) gives 


(20.38) 


base aa = a No(s y) 


ui (y)u2(y) — u5(y)ur(y) 
Equation (20.38) holds for any SOLDO with the given BCs. We now use 


the fact that the SOLDO has constant coefficients. In that case, we know the 
exact form of uw, and u2. There are two cases to consider: 
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1. Ifthe characteristic polynomial of L, has two distinct roots 4; and A2, 
then u1(x) = e*!* and u2(x) = e*2*. Writing 4; =a +b and Ap = 
a — b and substituting the exponential functions and their derivatives in 
Eq. (20.38) yields 


e(a—P)y plath)x _ e(ath)y ola—b)x 


2ber#y 


Gx. =| Jeo y) 


1 i _ 
= le" roxy) __ pla-DV(X-)]9(x — yy, 


which is a function of x — y alone. 
2. If A, =A2 =A, then uj (x) = e**, u2(x) = xe**, and substitution of 
these functions in Eq. (20.38) gives 


G(x, y) = (x — ye" a(x — y). 


20.3.3 Inhomogeneous BCs 


So far we have concentrated on problems with homogeneous BCs, R;[u] = 0, 
for i = 1,2. What if the BCs are inhomogeneous? It turns out that the 
Green’s function method, even though it was derived for homogeneous 
BCs, solves this kind of problem as well! The secret of this success is 
the generalized Green’s identity. Suppose we are interested in solving 
the DE 


L.[uJ= f(x) with Ruj=y fori=1,2, 


and we have the GF for L, (with homogeneous BCs, of course). We can 
substitute v = g(x, y) = G*(y, x) in the generalized Green’s identity and 
use the DE to obtain 


b b 
| w(e)Gy,x) FG) dx — | w(x)u(x)(LtLg])* dx 
= Q[u, g(x, R=, 


or, using L'[ g(x, y)] =46( — y)/w(y), 


b 
uo) = | w(x)G(y, x) fe)dx — Ofu, g*(x, y)] PE. 


To evaluate the surface term, let us write the BCs in matrix form [see 
Eq. (20.17)]: 


Aug+Bup=y => Up= Bly _ Bo! Au,, 


AG, +BG,=0 = A‘(B’) 'Qzg% + Qugt =0, 
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where y is a column vector composed of 7; and y2, and we have assumed 
that G(x, y) and g*(x, y) satisfy, respectively, the homogeneous BCs (with 
y = 0) and their adjoints. We have also assumed that the 2 x 4 matrix of 
coefficients has rank 2, and without loss of generality, let B be the invertible 
2 x 2 submatrix. Then, assuming the general form of the surface term as in 
Eq. (20.15), we obtain 


Olu, g*(x, y) [5% =u, Angi — ul, Qagt 
= (B-'y —B™! Aug)’ Q, gj — ul, Qug% 
= y'(B’)Qygk — ul, [A’(B')Qugk + Qug*] 
——_— 


= 0 because g”* satisfies 
homogeneous adjoint BC 


= y'(B')'Q,9%, (20.39) 


where 


= g*(b, y) )=( G(y,) ) 
bo \ Ae, ylx=0 GY, Op)” 


It follows that Q[u, g*(x, ei" is given entirely in terms of G, its deriva- 
tive, the coefficient functions of the DE (hidden in the matrix Q), the homo- 
geneous BCs (hidden in B), and the constants y, and y2. The fact that g* 
and dg*/dx appear to be evaluated at x = b is due to the simplifying (but 
harmless) assumption that B is invertible, i.e., that u(b) and u'(b) can be 
written in terms of u(a) and u'(a). Of course, this may not be possible; then 
we have to find another pair of the four quantities in terms of the other two, 
in which case the matrices and the vectors will change but the argument, as 
well as the conclusion, will remain valid. We can now write 


b 
u(y) = w(x)G(y, x) f (x) dx — y'Mg*, (20.40) 


where a general matrix M has been introduced, and the subscript b has been 
removed to encompass cases where submatrices other than B are invertible. 
Equation (20.40) shows that u can be determined completely once we know 
G(x, y), even though the BCs are inhomogeneous. In practice, there is no 
need to calculate M. We can use the expression for Q[u, g*] obtained from 
the Lagrange identity of Chap. 14 and evaluate it at b and a. This, in gen- 
eral, involves evaluating u and G and their derivatives at a and b. We know 
how to handle the evaluation of G because we can actually construct it (if it 
exists). We next find two of the four quantities corresponding to u in terms 
of the other two and insert the result in the expression for Q[u, g*]. Equa- 
tion (20.39) then guarantees that the coefficients of the other two terms will 
be zero. Thus, we can simply drop all the terms in Q[u, g*] containing a 
factor of the other two terms. 
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Specifically, we use the conjunct for a formally self-adjoint SOLDO [see 
Eq. (20.26)] and g*(x, y) = G(y, x) to obtain 


b 
u(y) = i wiGly, Of @)dx 


d aG aed 
- { pero] Gor —u(x)—G, »|| 
x Ox eos 
Interchanging x and y gives 
b 
Oe i w(y)GCe, y) FQ) dy 
aG du|)?=? 
=F {Porwor)| worse ex y)—- G(r, yall . (20.41) 
dy dy |} y=a 


This equation is valid only for a self-adjoint SOLDO. That is, using it re- 
quires casting the SOLDO into a self-adjoint form (a process that is always 
possible, in light of Theorem 14.5.4). 

By setting f(x) = 0, we can also obtain the solution to a homogeneous 
DE L,[u] = 0 that satisfies the inhomogeneous BCs. 


Example 20.3.10 Let us find the solution of the simple DE d*u/dx* = 
J (x) subject to the simple inhomogeneous BCs u(a) = y; and u(b) = yr. 
The GF for this problem has been calculated in Examples 20.1.5 and 20.3.6. 
Let us begin by calculating the surface term in Eq. (20.41). We have p(y) = 
1, and we set w(y) = 1, then 


surface term = u(b) ae — G(x, b)u'(b) — u(a) ge 


y=b IY |y=a 
+ G(x, a)u'(a) 
dG dG 
=n -—yvy— + G(x, a)u'(a) — G(x, b)u'(b). 
dy y=b dy y=a 


That the unwanted (and unspecified) terms are zero can be seen by observ- 
ing that G(x, a) = g*(a, x) = (g(a, x))*, and that g(x, y) satisfies the BCs 
adjoint to the homogeneous BCs (obtained when y; = 0). In this particular 
and simple case, the BCs happen to be self-adjoint (Dirichlet BCs). Thus, 
u(a) = u(b) = 0 implies that g(a, x) = g(b, x) = 0 for all x € [a,b]. dna 
more general case the coefficient of u'(a) would be more complicated, but 
still zero.) Thus, we finally have 


' dG 
—y— 
y=b dy 


C) 
surface term = y2 — 
dy 


y=a 


20.3. Green's Functions for SOLDOs 


Now, using the expression for G(x, y) obtained in Examples 20.1.5 
and 20.3.6, we get 


dG a-—x x—a 
= 7) 6 — 7 : 
dy hog a eae 
=0 
Thus, 
dG _x-a dG _x*-a _x-—b 


dy|,-, b-a’ day 


y=a 


Substituting in Eq. (20.41), we get 


2- V1 by, — ay2 
xt+ ‘ 
b-a b-a 


b 
uay= f Gee.»fordy+ 
a 
(Compare this with the result obtained in Example 20.1.5.) 


Green’s functions have a very simple and enlightening physical interpre- 
tation. An inhomogeneous DE such as L,[u] = f(x) can be interpreted as a 
black box (L,,) that determines a physical quantity (w) when there is a source 
(f) of that physical quantity. For instance, electrostatic potential is a physi- 
cal quantity whose source is charge; a magnetic field has an electric current 
as its source; displacements and velocities have forces as their sources; and 
so forth. Applying this interpretation and assuming that w(x) = 1, we have 
G(x, y) as the physical quantity, evaluated at x when its source 5(x — y) is 
located at y. To be more precise, let us say that the strength of the source is 
S; and it is located at y,; then the source becomes $14 (x — y1). The physical 
quantity, the Green’s function, is then $;G(x, y,), because of the linearity 
of L,: If G(x, y) is a solution of L,[u] = 6(« — y), then $1 G(x, y1) is a solu- 
tion of L,[u] = S|5(x — y). If there are many sources located at {y;};_, with 


corresponding strengths {5;}_,, then the overall source f as a function of x 


i=1? 
becomes f(x) = yy S;5(x — y;), and the corresponding physical quantity 
u(x) becomes u(x) = 7/_, SiG, yi). 

Since the source S; is located at y;, it is more natural to define a func- 
tion S(x) and write S$; = S(y;). When the number of point sources goes 
to infinity and y; becomes a smooth continuous variable, the sums become 


integrals, and we have 


b b 
fie i SOME—Ydy, ue) | S(y)G(x, y) dy. 


The first integral shows that S(x) = f(x). Thus, the second integral be- 
comes u(x) = f ie f()G(, y) dy which is precisely what we obtained for- 
mally. 
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20.4 Ejigenfunction Expansion 


Green’s functions are inverses of differential operators. Inverses of operators 
in a Hilbert space are best studied in terms of resolvents. This is because if 
an operator A has an inverse, zero is in its resolvent set, and 


Ro(A) =R,(A)|,_) =(A—A1)||,_) =A. 


Thus, it is instructive to discuss Green’s functions in the context of the re- 
solvent of a differential operator. We will consider only the case where the 
eigenvalues are discrete, for example, when L, is a Sturm-Liouville opera- 
tor. 

Formally, we have (L — 11)R, (L) = 1, which leads to the DE 


d(x — y) 


(Ly —A)Ri(@, y) = rics a 


where Ry (x, y) = (x|R,(L)|y). The DE simply says that R)(x, y) is the 
Green’s function for the operator L, — A. So we can rewrite the equation as 
d(x — y) 
(Ly —Gi(x, y) =, 
w(x) 

where L, — A is a DO having some homogeneous BCs. The GF Gj (x, y) 
exists if and only if (L, — A)[u] = 0 has no nontrivial solution, which is true 
only if A is not an eigenvalue of L,.. We choose the BCs in such a way that 
L,, becomes self-adjoint. 

Let {2,,}°°., be the eigenvalues of the system Ly [u] = Au, {R;[u] = Oe 4 


n=1 
and let the u(x) be the corresponding eigenfunctions. The index k dis- 
tinguishes among the linearly independent vectors corresponding to the 
same eigenvalue A,,. Assuming that L has compact resolvent (e.g., a Sturm- 
Liouville operator), these eigenfunctions form a complete set for the sub- 
space of the Hilbert space that consists of those functions that satisfy the 
same BCs as the u(x), In particular, G) (x, y) can be expanded in terms 
of u(x). The expansion coefficients are, of course, functions of y. Thus, 


we can write 


Gx, =>) aM OuPa) 


k n=1 


where a\(y) = [? wixyux (x)Gy(x, y)dx. Using Green’s identity, 


Eq. (20.30), and the fact that 2, is real, we have 
* 
Ana (y) = / w(x) [Anu (x) ]" Gate, y) dx 
a 
b ‘ : 
=} w(x)Gy(x, y) {Lx [u (x) ]}* dx 
a 


b 
= w(x)[u (x)]*Li[Gatx, y)] dx 
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b 
a w(x)uk® (x | = +260.) de 


=u Oy) 4a [ w(x)ut G(x, y)dx 


a 


=u y) + ray). 


Thus, a (y= ux) (y)/(n — 4), and the expansion for the Green’s func- 
tion is 
[oe 


#(D/ ® 
Giwy=>- un oe be () (20.42) 
k n=1 a 


This expansion is valid as long as A, # A for any n = 0, 1, 2,.... But this is 
precisely the condition that ensures the existence of an inverse for L — 41. 

An interesting result is obtained from Eq. (20.42) if A is considered a 
complex variable. In that case, G) (x, y) has (infinitely many) simple poles 
at {An}ro . The residue at the pole A, is — >, u ux ® (yyy (x). If Cm is a 
contour Gaaing the poles {A,,}/"_, in its interior, then, by the residue theorem, 
we have 


1 m 
<P Gre,ydra=— D> ue (uM). 
Cn 


220i 
k n=l 


In particular, if we let  — oo, we obtain 


1 
mmf. Gi(x, y)dk=— ye 5 (yu (x) 


k n=1 


ee de J (20.43) 
w(x) 
where Coo is any contour that encircles all the eigenvalues, and in the last 
step we used the completeness of the eigenfunctions. Equation (20.43) is the 
infinite-dimensional analogue of Eq. (17.10) with f(A) = 1 when the latter 
equation is sandwiched between (x| and |y). 


Example 20.4.1 Consider the DO L, = d? /dx* with BCs u(0) = u(a) =0. 
This is an S-L operator with eigenvalues and normalized eigenfunctions 


ni \7 2 . (nt 
An ={ — and uy,(x) =,/—sin{| —x forn=1,2,.... 
a a a 


Equation (20.42) becomes 


2 QS sin(nzx/a) sin(nsy/a) 
GiQx.y)=-= >> Sane 


n=1 


which leads to 
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ae, Gy (x, y)dar 
201 fone 


1 2 S sin(nzx/a) sin(ny/a) 
-a$, | a d= (n/a)? Je 


n=1 
i fe dz 
ee ea 
2mi \a a= a a 9 & — (n/a) 


i=} 


05 


n=1 


ni . (nt 1 
—vx } sin{ —y } Res} ——————— 
a ) ( a ) E a amar |, 


The RHS is recognized as —d(x — y). 


If zero is not an eigenvalue of L,., Eq. (20.42) yields 


ie.) * 


(k) (k) 
pects (20.44) 


k n=1 An 


which is an expression for the Green’s function of L, in terms of its eigen- 
values and eigenfunctions. 


20.5 Problems 


20.1 Using the GF method, solve the DE Lyu(x) = du/dx = f (x) subject 
to the BC u(0) = a. Hint: Consider the function v(x) = u(x) — a. 


20.2 Solve the problem of Example 20.1.4 subject to the BCs u(a) = 
u’(a) = 0. Show that the corresponding GF also satisfies these BCs. 


20.3 Show that the IVP with data {0; 0,0, ...,0} has only u = 0 as a solu- 
tion. Hint: Assume otherwise, add u to the solution of the inhomogeneous 
equation, and invoke uniqueness. 


20.4 In this problem, we generalize the concepts of exactness and integrat- 
ing factor to a NOLDE. The DO LY” = 3719 px(x)d*/dx* is said to be 


exact if there exists a DO MS? = 7") ay (x)d*/dx* such that 
(n) d (n—1) n 
Lu] = ay (Ms [w]) Vue "fa, b]. 


(a) Show that LY” is exact iff 7” _)(—1)"d” pm /dx™ =0. 
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(b) Show that there exists an integrating factor for L”) that is, a function 
w(x) such that ju(x)L’” is exact—if and only if ju(x) satisfies the DE 


n qm 
NO [ul = >> (-1)” Jam (HPm) = 0- 
m=0 


The DO N” is the formal adjoint of LY”. 


20.5 Let Ly be aSOLDO. Assuming that L,[u] = 0 has no nontrivial solu- 
tion, show that the matrix 


r= (Rn ac) 
~ \Ro[wi] Rolu2])’ 


where uw; and u2 are independent solutions of L,[u] = 0 and R; are the 
boundary functionals, has a nonzero determinant. Hint: Assume otherwise 
and show that the system of homogeneous linear equations @Ri[u1] + 
BR, [u2] = 0 and aR2[u;] + BR2[u2] = 0 has a nontrivial solution for (a, f). 
Reach a contradiction by considering u = au, + Buz as a solution of 
L,[u] =0. 


20.6 Determine the formal adjoint of each of the operators in (a) through (d) 
below (i) as a differential operator, and (ii) as an operator, that is, including 
the BCs. Which operators are formally self-adjoint? Which operators are 
self-adjoint? 


(a) Lb, =d*/dx* +1 in [0, 1] with BCs u(0) =u(1) =0. 

(b) L, =d?/dx? in [0, 1] with BCs u(0) =u’ (0) =0. 

(c) Ly =d/dx in [0, co] with BCs u(0) = 0. 

(d) Ly =d?/dx3 — sinxd/dx +3 in [0,2] with BCs u(0) = w’(0) = 0, 
u" (0) — 4u(7r) = 0. 


20.7 Show that the Dirichlet, Neumann, general unmixed, and periodic BCs 
make the following formally self-adjoint SOLDO self-adjoint: 


20.8 Using a procedure similar to that described in the text for SOLDOs, 
show that for the FOLDO L, = pid/dx + po 


(a) _ the indefinite GF is 


wy) i“ = >| 
G ’ = Cc ’ 
= FEW. wey) 
where p(x) = exo| | Pott) ar| 
pict) 


(b) and the GF itself is discontinuous at x = y with 


1 


lim[G eee = 
lim[ (yte,y)-Giy-«,y)] oie 
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(c) For the homogeneous BC 
R[u] = au(a) + a2u'(a) + Byu(b) + B2u'(b) =0 


construct G(x, y) and show that 
1 
Gx, y) = ———_n (x) 0 (x — y) + C(y) v(x), 
Pi(y)w(y)v(y) 
where v(x) is any solution to the homogeneous DE L,[u] = 0 and 
Biv(b) + Bov'(b) 

Riv] pi(y)w(y) vy)’ 
(d) Show directly that L,[G] = 6(x — y)/w(x). 


C(y) = with R[v] 40 


20.9 Let L, be a NOLDO with constant coefficients. Show that if u(x) sat- 
isfies L,[u] = f(x), then u(x — y) satisfies Ly[u] = f(x — y). (Note that no 
BCs are specified.) 


20.10 Find the GF for Ly = d*/dx* + 1 with BCs u(0) = w’(0) = 0. Show 
that it can be written as a function of x — y only. 


20.11 Find the GF for Ly = d?/dx? +k? with BCs u(0) = u(a) =0. 
20.12 Find the GF for Ly = d?/dx? — k? with BCs u(oo) = u(—oo) = 0. 


20.13 Find the GF for Ly = (d/dx)(xd/dx) given the condition that 
G(x, y) is finite at x = 0 and vanishes at x = 1. 


20.14 Evaluate the GF and the solutions for each of the following DEs in 
the interval [0, 1]. 


(a) u”—-ku=f; u(O0)—u'O)=a, u(l)=b. 

(b) u” =f; u(0) =u' (0) =0. 

(c) wu” +6u'+9u=0; u(0)=0, u'(0)=1. 

(d) u”+o*u= f(x), forx>0; u(0)=a, u'(0)=b. 

(e) uw =f; u(0)=0, u/(0)=2u'(1), u(l)=a, u"(0)=0. 


20.15 Use eigenfunction expansion of the GF to solve the BVP u” = x, 
u(0) = 0, u(1) — 2u’(1) = 0. 


Multidimensional Green’s Functions: 2 1 
Formalism 


The extensive study of Green’s functions in one dimension in the last chapter 
has no doubt exhibited the power and elegance of their use in solving inho- 
mogeneous differential equations. If the differential equation has a (unique) 
solution, the GF exists and contains all the information necessary to build it 
up. The solution results from operating on the inhomogeneous term with an 
integral operator whose kernel is the appropriate Green’s function. 

The Green’s function’s very existence depends on the type of BCs im- 
posed. We encountered two types of problems in solving ODEs. The first, 
called initial value problems (IVPs), involves fixing (for an nth-order DE) 
the value of the solution and its first n — 1 derivatives at a fixed point. Then 
the ODE, if it is sufficiently well-behaved, will determine the values of the 
solution in the neighborhood of the fixed point in a unique way. Because of 
this uniqueness, Green’s functions always exist for IVPs. 

The second type of problems, called boundary value problems (BVPs), 
consists—when the DE is second order—of determining a relation between 
the solution and its derivative evaluated at the boundaries of some interval 
[a, b]. These boundary values are relations that we denoted by R;[u] = 7;, 
where i = 1,2. In this case, the existence and uniqueness of the Green’s 
function are not guaranteed. 

There is a fundamental (topological) difference between a boundary in 
one dimension and a boundary in two and more dimensions. In one dimen- 
sion a boundary consists of only two points; in 2 and higher dimensions a 
boundary has infinitely many points. The boundary of a region in R? is a 
closed curve, in R? it is a closed surface, and in R’” it is called a hypersur- 
face. This fundamental difference makes the study of Green’s functions in 
higher dimensions more complicated, but also richer and more interesting. 


21.1 Properties of Partial Differential Equations 


This section presents certain facts and properties of PDEs, in particular, how 
BCs affect their solutions. We shall discover the important difference be- 
tween ODEs and PDEs: The existence of a solution to a PDE satisfying a 
given BC depends on the type of the PDE. 
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We shall be concerned exclusively with a linear PDE. A linear PDE of 
order M in m variables is of the form 


all 
ax!’ 


M 
Lyfu]= f(x) where Lx= D> )oas() 


|JJ=1 J 
where the following notation has been used: 
X= (X1,..-,Xm), J=(j1,--+5Jm), 


all all 


7 SS —; 
ox x1! Ox? +++ OXin" 


\J| = sit jot--++ jm, 


the jx, are nonnegative integers; M, the order of the highest derivative, is 
called the order of the PDE. The outer sum in Eq. (21.1) is over | J|; once | J| 
is fixed, the inner summation goes over individual j;’s with the restriction 
that their sum has to equal the given | J]. 
The principal part of Ly is 
g@ 
be De CGE: (21.2) 


The coefficients aj; and the inhomogeneous (or source) term f are assumed 
to be continuous functions of their arguments. 

We consider Eq. (21.1) as an IVP with appropriate initial data. The most 
direct generalization of the IVP of ordinary differential equation theory is 
to specify the values of u and all its normal derivatives of order less than or 
equal to M — 1 ona hypersurface I of dimension m — 1. This type of initial 
data is called Cauchy data, and the resulting IVP is known as the Cauchy 
problem for L,. The reason that the tangential derivatives do not come into 
play here is that once we know the values of u on I, we can evaluate u 
on two neighboring points on I’, take the limit as the points get closer and 
closer, and evaluate the tangential derivatives. 


21.1.1 Characteristic Hypersurfaces 


In contrast to the [VP in one dimension, the Cauchy problem for arbitrary 
Cauchy data may not have a solution, or if it does, the solution may not be 
unique. 


Box 21.1.1 The existence and uniqueness of the solution of the 
Cauchy problem depend crucially on the hypersurface Y and on the 
type of PDE. 


We assume that I’ can be parametrized by a set of m functions of m — 1 
parameters. These parameters can be thought of as generalized coordinates 
of points of T’. 


21.1 Properties of Partial Differential Equations 


Consider a point P on I. Introduce m — 1 coordinates &, ..., &,, called 
tangential coordinates, to label points on I’. Choose, by translation if nec- 
essary, coordinates in such a way that P is the origin, with coordinates 
(0, 0,...,0). Now let v = & stand for the remaining coordinate normal to 
I’. Usually &; is taken to be the ith coordinate of the projection of the point 
on I onto the hyperplane tangent to I’ at P. 

As long as we do not move too far away from P, the Cauchy data on P 
can be written as 


ou gly 
u(0, &, a | Em), ayo &, So Os8"'9 Em), ae) 3yMa1 &, eres Em). 


Using the chain rule, du/dx; = ave (du/0&;)(0&; /dx;), where | = v, we 
can also determine the first M — | derivatives of u with respect to x;. The 
fundamental question is whether we can determine uw uniquely using the 
above Cauchy data and the DE. To motivate the answer, let’s look at the 
analogous problem in one dimension. 

Consider the Mth-order linear ODE 


dMy du 
Leu] = am (1) ag + Fa) + ao(x)u = FX) (21.3) 
x dx 
with the following initial data at x9: {u(xo), u/(xo),...,u“— (xo)}. If the 


coefficients {ax Gye 9 and the inhomogeneous term f(x) are continuous 
and if ay (xo) 4 0, then Theorem 20.2.2 implies that there exists a unique 
solution to the IVP in a neighborhood of xo. 

For ay (xo) # 0, Eq. (21.3), the initial data, and a knowledge of f (xo) 
give u) (xo) uniquely. Having found u‘” (x9), we can calculate, with ar- 
bitrary accuracy (by choosing Ax small enough), the following set of new 
initial data at x} = xo + Ax: 


u(x) = u(x) tu! (x9) Ax, ..., uM —-Y x1) = u™-Y (xp) tu (x9) Ax. 


Using these new initial data and Theorem 20.2.2, we are assured of a unique 
solution at x. Since ajy(x) is assumed to be continuous for x;, for suffi- 
ciently small Ax, ay (xo) is nonzero, and it is possible to find newer initial 
data at x2 = x; + Ax. The process can continue until we reach a singularity 
of the DE, a point where aj (x) vanishes. We can thus construct the unique 
solution of the IVP in an interval (xo, b) as long as ay(x) does not van- 
ish anywhere in [xo, b]. This procedure is analogous to the one used in the 
analytic continuation of a complex function. 

For ay (xo) = 0, however, we cannot calculate u“) (xq) unambiguously. 
In such a case the LHS of (21.3) is completely determined from the initial 
data. If the LHS happens to be equal to f (xo), then the equation is satisfied 
for any u™) (x9), i.e., there exist infinitely many solutions for uu (x9); 
if the LHS is not equal to f(xo), there are no solutions. The difficulty 
can be stated in another way, which is useful for generalization to the m- 
dimensional case: If ay (xo) = 0 in (21.3), then the initial data determine 
the function L,[u]. 
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Let us now return to the question of constructing u and investigate con- 
ditions under which the Cauchy problem may have a solution. We follow 
the same steps as for the [VP for ODEs. To construct the solution numeri- 
cally for points near P but away from I" (since the function is completely 
determined on I’, not only its Mth derivative but derivatives of all orders 
are known on I"), we must be able to calculate aMy / av™ at P. This is not 
possible if the coefficient of aM! u/av! in Lx[u] is zero when x1,...,Xm 
is written in terms of v, &,...,&,. When this happens, L,[w] itself will be 
determined by the Cauchy data. This motivates the following definition. 


Definition 21.1.2 If Ly[u] can be evaluated at a point P on I’ from the 
Cauchy data alone, then I is said to be characteristic for Ly at P. If T is 
characteristic for all its points, then it is called a characteristic hypersur- 
face for Ly. The Cauchy problem does not have a solution at a point on the 
characteristic hypersurface. 


The following theorem characterizes I’: 


Theorem 21.1.3 Let l be a smooth (m — 1)-dimensional hypersurface. Let 
L,[u] = f be an Mth-order linear PDE in m variables. Then T is char- 
acteristic at P €T if and only if the coefficient of u/dv™ vanishes 
when Ly is expressed in terms of the normal-tangential coordinate system 


(vy, &, asec} Em). 


One can rephrase the foregoing theorem as follows: 


Box 21.1.4 The hypersurface TY is not characteristic at P if and only 
if all Mth-order partial derivatives of u with respect to {x;}"_, are 
unambiguously determined at P by the DE and the Cauchy data on . 


In the one-dimensional case the difficulty arose when ay (xo) = 0. In 
the language being used here, we could call xo a “characteristic point.” This 
makes sense because in this special case (m = 1), the hypersurfaces can only 
be of dimension 0. Thus, we can say that in the neighborhood of a charac- 
teristic point, the IVP has no well-defined solution.! For the general case 
(m > 1), we can similarly say that the Cauchy problem has no well-defined 
solution in the neighborhood of P if P happens to lie on a characteristic 
hypersurface of the differential operator. Thus, it is important to determine 
the characteristic hypersurfaces of PDEs. 


Example 21.1.5 Let us consider the first-order PDE in two variables 


Ou Ou 
Ly[u] =a(x, y)— +b, y)— + F(x, y,u) =0 (21.4) 
Ox dy 


‘Here lies the crucial difference between ODEs and PDEs: All ODEs have a universal 
characteristic hypersurface, i.e., a point. PDEs, on the other hand, can have a variety of 
hypersurfaces. 


21.1 Properties of Partial Differential Equations 


where F(x, y,u) =c(x, y)u+d(x, y). For this discussion the form of F is 
irrelevant. 

We wish to find the characteristic hypersurfaces (in this case, curves) 
of L. The Cauchy data consist of a simple determination of u on I’. By The- 
orem 21.1.3, we need to derive relations that ensure that du/dx and du/dy 
cannot be unambiguously determined at P = (x, y). Using an obvious no- 
tation, the PDE of Eq. (21.4) gives 


du du 
—F(P,u(P)) =a(P)—(P)+b(P)—(P). 
ox oy 
On the other hand, if Q = (x + dx, y + dy) lies on the curve I, then 
ou ou 
u(Q) —u(P) = dx —(P)+dy—(P). 
ox dy 


The Cauchy data determine the LHS of both of the preceding equations. 
Treating these equations as a system of two linear equations in two un- 
knowns, du/dx(P) and du/dy(P), we conclude that the system has a 
unique solution if and only if the matrix of coefficients is invertible. Thus, 
by Box 21.1.4, I’ is a characteristic curve if and only if 


dx dy \ _ _ 
det ee 7) = b(P)dx —a(P)dy=0, 


or dy/dx = b(x, y)/a(x, y), assuming that a(x, y) 4 0. Solving this FODE 
yields y as a function of x, thus determining the characteristic curve. Note 
that a general solution of this FODE involves an arbitrary constant, resulting 
in a family of characteristic curves. 


Historical Notes 

Sofia Vasilyevna Kovalevskaya (1850-1891) is considered the greatest woman math- 
ematician prior to the twentieth century. She grew up in a well-educated family of the 
Russian nobility, her father being an artillery general and reputed to be a descendant of 
a Hungarian king, Mathias Korvin. Sonja was educated by a British governess and en- 
joyed life at the large country estate of her father’s family, although the rather progressive 
thinking of the Kovalevsky sisters did not always meet with approval from their father. 
Sonja has written of two factors that attracted her to the study of mathematics. The first 
was her Uncle Pyotr, who had studied the subject on his own and would speak of squaring 
the circle and of the asymptote, as well as of many other things that excited her imagina- 
tion. The second was a curious “wallpaper” that was used to cover one of the children’s 
rooms at Polibino, which turned out to be lecture notes on differential and integral calcu- 
lus that had been purchased by her father in student days. These sheets fascinated her and 
she would spend hours trying to decipher separate phrases and to find the proper ordering 
of the pages. 

In the autumn of 1867 Sonja went to St. Petersburg, where she studied calculus with 
Alexander Strannolyubsky, a teacher of mathematics at the naval school. While there, 
she consulted the prominent Russian mathematician Chebyshev about her mathematical 
studies, but since Russian universities were closed to women, there seemed to be no way 
that she could pursue advanced studies in her native land. 

In order to escape the oppression of women common in Russia at the time, young ladies 
of ambition and ability would often arrange a marriage of convenience in order to al- 
low study at a foreign university. At the age of 18, Sonya arranged such a marriage 
with Vladimir Kovalevsky, a paleontologist, and in 1869 the couple moved to Heidel- 
berg, where Sonja took courses from Kirchhoff, Helmholtz, and others. Two years later 
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she went to Berlin, where she worked with Weierstrass, who tutored her privately, since 
she, as a woman, was not allowed to attend lectures. 

The three papers she published in the next three years earned her a doctorate in absentia 
from the University of Gottingen. Unfortunately, even that distinction was not sufficient 
to gain her a university position anywhere in Europe, despite strong recommendation 
from the renowned Weierstrass. Her rejections resulted in a six-year period during which 
time she neither undertook research nor replied to Weierstrass’s letters. She was bitter to 
discover that the best job she was offered was teaching arithmetic to elementary classes 
of schoolgirls, and remarked, “I was unfortunately weak in the multiplication table.” 

The existence and uniqueness of solutions to partial differential equations occupied the 
attention of many notable mathematicians of the last century, including Cauchy, who 
transformed the problem into his method of majorant functions. This method was later 
extended and refined by Kovalevskaya to include more general cases. The result was the 
now-famous Cauchy—Kovalevskaya theorem. She also contributed to the advancement of 
the study of Abelian integrals and functions and applied her knowledge of these topics to 
problems in physics, including her paper “On the Rotation of a Solid Body About a Fixed 
Point,” for which she won a 5000-franc prize. She also performed some investigations 
into the dynamics of Saturn’s rings, inspiring a sonnet in which she is named “Muse of 
the Heavens.” In 1878, Kovalevskaya gave birth to a daughter, but from 1880 increas- 
ingly returned to her study of mathematics. In 1882 she began work on the refraction of 
light, and wrote three articles on the topic. In the spring of 1883, Vladimir, from whom 
Sonja had been separated for two years, committed suicide. After the initial shock, Ko- 
valevskaya immersed herself in mathematical work in an attempt to rid herself of feelings 
of guilt. Mittag-Leffler managed to overcome opposition to Kovalevskaya in Stockholm, 
and obtained for her a position as privat docent. She began to lecture there in early 1884, 
was appointed to a five-year extraordinary professorship in June of that year, and in June 
1889 became the third woman ever to hold a chair at a European university. 

During Kovalevskaya’s years at Stockholm she carried out important research, taught 
courses on the latest topics in analysis, and became an editor of the new journal Acta 
Mathematica. She was the liaison with the mathematicians of Paris and Berlin, and took 
part in the organization of international conferences. Interestingly, Kovalevskaya also nur- 
tured a parallel career in literature, penning several novels and a drama, “The Struggle for 
Happiness” that was favorably received at the Korsh Theater in Moscow. She died at the 
pinnacle of her scientific career from a combination of influenza and pneumonia less than 
two years after her election to both the Swedish and the Russian Academies of Sciences. 
The latter membership being initiated by Chebyshev, in spite of the Tsarist government’s 
repeated refusal to grant her a university position in her own country. 


21.1.2 Second-Order PDEs in m Dimensions 


Because of their importance in mathematical physics, the rest of this chapter 
and the next will be devoted to SOPDEs. This subsection classifies SOPDEs 
and the BCs associated with them. 

The most general linear SOPDE in m variables can be written as 


> Ana apt 5 + Cou =0 
pe=1 


where A jx can be assumed to be symmetric in j and k. We restrict ourselves 
to the simpler case in which the matrix (A jx) is diagonal.* We therefore 


>This is not a restriction because, by a change of variables and Theorem 6.6.6 (especially 
the comments after it) A jx can be brought to a diagonal form. 


21.1 Properties of Partial Differential Equations 


consider the PDE 


m 


ou 
Yoa j0) trem x). (21.5) 
ox 


j=l 


where the last term collects all the terms except the second derivatives. We 
classify SOPDEs as follows: 


1. Equation (21.5) is said to be of elliptic type at xo if all the coefficients 
aj(Xo) are nonzero and have the same sign. 

2. Equation (21.5) is said to be of ultrahyperbolic type at Xo if all a; (xo) 
are nonzero but do not have the same sign. If only one of the coeffi- 
cients has a sign different from the rest, the equation is said to be of 
hyperbolic type. 

3. Equation (21.5) is said to be of parabolic type at xo if at least one of 
the coefficients a; (xg) is zero. 


If a SOPDE is of a given type at every point of its domain, it is said to be 
of that given type. In particular, if the coefficients a; are constants, the type 
of the PDE does not change from point to point. 


Example 21.1.6 In this example, we study the SOPDE in two dimensions. 
The most general linear SOPDE is 
a7u a7u au 
= + 2b = 
"ox? axdy oe y? 


du Ou 

+ F(x, y,u, —,—]})=0, (21.6) 
ox dy 

where a, b, and c are functions of x and y. 

To determine the characteristic curves of L, we seek conditions under 
which all second-order partial derivatives of u can be determined from the 
DE and the Cauchy data, which are values of u and all its first derivatives 
on I’. Consider a point OQ = (x + dx, y + dy) close to P = (x, y). We can 
write 


a2 u 


(0) - (P= dra 


DY ia By adie ead "Ph, 
8 oy 28 py — ae ae 
dy axdy yay? 


2 


aru a7u 
-F(P, u(P), 5 =P), Fa(P)) =a(P)Sa(P) +20(P) a (P) 
x ox dy 


au 
$e(P) sr P). 


This system of three linear equations in the three unknowns—the three sec- 
ond derivatives evaluated at P—has a unique solution if and only if the 
determinant of the coefficients is nonzero. Thus, by Box 21.1.4, ’ is a char- 
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acteristic curve if and only if 


dx dy 0) 


a(P) 2b(P) c(p) 


or a(x, y) (dy)? — 2b(x, y)dxdy +c(x, y)(dx)? = 0. It then follows, assum- 
ing that a(x, y) £0, that 
dy b+Vb* —ac 


— : 21.7 
dx a ( ) 


There are three cases to consider: 


1. If b* —ac <0, Eq. (21.7) has no solution, which implies that no char- 
acteristic curves exist at P. Problem 21.1 shows that the SOPDE is of 
elliptic type. Thus, the Laplace equation in two dimensions is elliptic 
because b? — ac = —1. In fact, it is elliptic in the whole plane, or, stated 
differently, it has no characteristic curve in the entire xy-plane. This 
may lead us to believe that the Cauchy problem for the Laplace equa- 
tion in two dimensions has a unique solution. However, even though 
the absence of a characteristic hypersurface at P is a necessary con- 
dition for the existence of a solution to the Cauchy problem, it is not 
sufficient. Problem 21.4 presents a Cauchy problem that is ill-posed, 
meaning that the solution at any fixed point is not a continuous func- 
tion of the initial data. Satisfying this continuity condition is required 
of a well-posed problem on both mathematical and physical grounds. 

2. If b* —ac>0, Eq. (21.7) has two solutions; that is, there are two 
characteristic curves passing through P. Problem 21.1 shows that the 
SOPDE is of hyperbolic type. The wave equation is such an equation 
in the entire R?. 

3. If b? — ac =0, Eq. (21.7) has only one solution. In this case there is 
only one characteristic curve at P. The SOPDE is parabolic in this case. 
The one-dimensional diffusion equation is an example of an SOPDE 
that is parabolic in the entire R. 


The question of what type of BCs to use to obtain a unique solution for a 
PDE is a very intricate mathematical problem. As Problem 21.4 shows, even 
though it has no characteristic curves in the entire R?, the two-dimensional 
Laplace equation does not lead to a well-posed Cauchy problem. On the 
other hand, examples in Chap. 19 that dealt with electrostatic potentials and 
temperatures led us to believe that a specification of the solution u on a 
closed curve in 2D, and a closed surface in 3D, gives a unique solution. This 
has a sound physical basis. After all, specifying the temperature (or electro- 
static potential) on a closed surface should be enough to give us information 
about the temperature (or electrostatic potential) in the region close to the 
curve. 


Definition 21.1.7 A boundary condition in which the value of the solution 
is given on a closed hypersurface is called a Dirichlet boundary condition, 
and the associated problem, a Dirichlet BVP. 


21.2. Multidimensional GFs and Delta Functions 


There is another type of BC, which on physical grounds is appropriate for 
the Laplace equation. This condition is based on the fact that if the surface 
charge on a conductor is specified, then the electrostatic potential in the 
vicinity of the conductor can be determined uniquely. The surface charge on 
a conductor is proportional to the value of the electric field on the conductor. 
The electric field, on the other hand, is the normal derivative of the potential. 


Definition 21.1.8 A boundary condition in which the value of the normal 
derivative of the solution is specified on a closed hypersurface is called a 
Neumann BC, and the associated problem, a Neumann boundary value 
problem. 


Thus, at least on physical grounds, either a Dirichlet BVP or a Neumann 
BVP is a well-posed problem for the Laplace equation. 

For the heat (or diffusion) equation we are given an initial temperature 
distribution f(x) on a bar along, say the x-axis, with end points held at 
constant temperatures. For a bar with end points at x = a and x = J, this is 
equivalent to the data u(0, x) = f(x), u(t, a) = Tj, and u(t, b) = T2. These 
are not Cauchy data, so we need not worry about characteristic curves. The 
boundary curve consists of three parts: (1) t = 0 fora <x <b, (2) t > 0 for 
x =a, and (3) t > 0, for x = b. In the xt-plane, these form an open rectangle 
consisting of ab as one side and vertical lines at a and b as the other two. 
The problem is to determine u on the side that closes the rectangle, that is, 
on the sidea <x <batt>0. 

The wave equation requires specification of both u and du/dt at t = 0. 
The displacement of the boundaries of the waving medium—a taut rope 
for example—must also be specified. Again the curve is open, as for the 
diffusion case, but the initial data are Cauchy. Thus, for the wave equation 
we do have a Cauchy problem with Cauchy data specified on an open curve. 
Since the curve, the open rectangle, is not a characteristic curve of the wave 
equation, the Cauchy problem is well-posed. We can generalize these BCs 
to m dimensions. 


Box 21.1.9 The following correspondences exist between SOPDEs 
with m variables and their appropriate BCs: 


1. Elliptic SOPDE <> Dirichlet or Neumann BCs on a closed hy- 
persurface. 

2. Hyperbolic SOPDE <> Cauchy data on an open hypersurface. 

3. Parabolic SOPDE <> Dirichlet or Neumann BCs on an open hy- 
persurface. 


21.2 Multidimensional GFs and Delta Functions 


This section will discuss some of the characteristics of Green’s functions 
in higher dimensions. These characteristics are related to the formal partial 
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differential operator associated with the Green’s function and also to the 
delta functions. 

Using the formal idea of several continuous indices, we can turn the op- 
erator equation LG = 1 into the PDE 

5(x— 
LG, y) = “S—9) (21.8) 
w(x) 

where x, y € R”, w(x) is a weight function that is usually set equal to one, 
and, only in Cartesian coordinates, 


B(x — y) = 8(x1 — y1)8(X2 — y2)-+ 8m — Ym) =] [8 — yi). 21.9) 


i=1 


In most applications Cartesian coordinates are not the most convenient to 
use. Therefore, it is helpful to express Eqs. (21.8) and (21.9) in other co- 
ordinate systems. In particular, it is helpful to know how the delta function 
transforms under a general coordinate transformation. 

Let xj = fi(é1,.-.,&m), i = 1,2,...,m, be a coordinate transforma- 
tion. Let P be a point whose coordinates are a = (d),...,dm) and a = 
(Q1,...,@m) in the x and € coordinate systems, respectively. Let J be the 
Jacobian of the transformation, that is, the absolute value of the determinant 
of a matrix whose elements are 0x; /0&;. For a function F(x) the definition 
of the delta function gives 


[ansF eases —a) = F(a). 


Expressing this equation in terms of the € coordinate system, recalling 
that d"x = Jd™é and a; = f;(@), and introducing the notation H(é) = 
F(fi€é),.--, fn €)), we obtain 


[anssn@]]9(f€) - fe) =H). (21.10) 
i=1 


This suggests that 


m 


JT] s(f® - A@) =] ]8& - a), 
i=l 


i=l 
or, in more compact notation, 
J5(x—a) =d(€ —@). 


It is, of course, understood that J 4 0 at P. What happens when J = 0 at P? 

A point at which the Jacobian vanishes is called a singular point of the 
transformation. Thus, all points on the z-axis, including the origin, are sin- 
gular points of Cartesian—spherical transformation. Since J is a determinant, 
its vanishing at a point signals lack of invertibility at that point. Thus, in the 
transformation from Cartesian to spherical coordinates, all spherical coor- 
dinates (5, 7, ~), with arbitrary g, are mapped to the Cartesian coordinates 


21.2 Multidimensional GFs and Delta Functions 


(0, 0, —5). Similarly, the point (0, 0, 0) in the Cartesian coordinate system 
goes to (0, 6, @) in the spherical system, with 6 and @ arbitrary. A coordi- 
nate whose value is not determined at a singular point is called an ignorable 
coordinate at that point. Thus, at the origin both 6 and ¢ are ignorable. 

Among the & coordinates, let {&}/"_, 4, be ignorable at P with Cartesian 
coordinates a. This means that any function, when expressed in terms of 
&’s, will be independent of the ignorable coordinates. A reexamination of 
Eq. (21.10) reveals that (see Problem 21.8) 


5(x—a) = FTTH where Jem f dbs dé. (21.11) 


In particular, if the transformation is invertible, k =m and Jj, = J, and we 
recover J5(x — a) = d(E —@). 


Example 21.2.1 In two dimensions the transformation between Cartesian 
and polar coordinates is given by xj =x =rcosé = &\cos&, x» =y= 
rsin@ = €; siné2 with the Jacobian 


- Ox1/0& Ox, /d&\ _ cos —&sinés\ , | 
oe (as sa mar ae £1 cosé ) ==", 


which vanishes at the origin. The angle @ is the only ignorable coordinate at 
the origin. Thus, = 2 — 1 = 1, and 


2a 20 5(r) 
n=[ sao | rd0=2ar => B(x) =4(x)8(y) = 5. 
0 0 2nr 


In three dimensions, the transformation between Cartesian and spherical 
coordinates yields the Jacobian J = r? sin@. This vanishes at the origin re- 
gardless of the values of 6 and g. We thus have two ignorable coordinates 
at the origin (therefore, k = 3 — 2 = 1), over which we integrate to obtain 


é(r) 


qrr2 


20 
= [ ay |" dér* sind =42r? => = 5 


21.2.1 Spherical Coordinates in m Dimensions 


In discussing Green’s functions in m dimensions, a particular curvilinear 
coordinate system will prove useful. This system is the generalization of 
spherical coordinates in three dimensions. The m-dimensional spherical co- 
ordinate system is defined as 


m—k 
a=(T] sno) COS On —k+415 k=1,...,m, (21.12) 
j=l 


where, by definition, we set 0,, = 0 and Ibs sind; = 1. (Note that for 
m = 3, the first two Cartesian coordinates are switched compared to their 
usual definitions.) 
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It is not hard to show (see Example 21.2.2) that the Jacobian of the trans- 
formation (21.12) is 


J =r"—|(sin6)"~? (sin 02)"~> --- (sin Og)" | « «sin @mn—2 (21.13) 
and that the volume element in terms of these coordinates is 
d™x = J dr d0, +--+ dOm—1 =r"! dr dQqm, (21.14) 
where 
dQy = (sin61)""~* (sin 02)? «++ sin @n—2d01d02+++ dOn—1 (21.15) 
is the element of the m-dimensional solid angle. 


Example 21.2.2 For m = 4 we have 
x1 =rsin6 sin 6 sin 63, x2 =r sin6 sin 42 cos 63, 
x3 =r sin@; cos @, x4 =rcos@,, 

and the Jacobian is given by 


Ax1/8r 9x1 /00, 8x1/80. 9x1/305 
_ 0x2/or 0x2/00, 0x2 /002 0x2/003 _ 3 32 . 
Ol oy: ay. (doy Ax fie: Sse | 


dx4/dr 0x4/00, 0x4/002 0x4/0603 


It is readily seen (one can use mathematical induction to prove it rig- 
orously) that the Jacobians for m = 2 (J =r), m=3 (J = r*sin@1), and 
m=4(J=Pr3 sin? 0; sin 2) generalize to Eq. (21.13). 


Using the integral 


if sin” 6d0 = gl + D7! 
0 rl 


(n+ 2)/2] 
the total solid angle in m dimensions can be found to be 


Aq /2 
Qn = : 21.16 
™ = Son/D) ( ) 


An interesting result that is readily obtained is an expression of the 
delta function in terms of spherical coordinates at the origin. Since r = 0, 
Eq. (21.12) shows that all the angles are ignorable. Thus, we have 


n= f 4a5.-- dm = I" f dom =F" Dp, 


which yields 


d(r) Pon/2)6(r) 
Qyrm} = Q7m/2ypm—1 * 


8(x) = 6(x1)-+-5(X%m) = (21.17) 


21.2 Multidimensional GFs and Delta Functions 


21.2.2 Green’s Function for the Laplacian 


With the machinery developed above, we can easily obtain the (indefi- 
nite) Green’s function for the Laplacian in m dimensions. We will ignore 
questions of BCs and simply develop a function that satisfies V7G(x, y) = 
6(x — y). Without loss of generality we let y = 0; that is, we translate the 
axes so that y becomes the new origin. Then we have V2G(x) = 6(x). In 
spherical coordinates this becomes 


(r) 


oe =. 
(x) Qyr™) 


(21.18) 


by (21.17). Since the RHS is a function of r only, we expect G to behave in 
the same way. We now have to express V? in terms of spherical coordinates. 


In general, this is difficult; however, for a function of r = , ey ferret a 


alone, such as F(r), we have 


OF dF or _OFu | OF Fx? AF (1 x? 
= — Ni = 
OX; or OX; or r Ox;2 ar2 r2 or \r ry? 
so that 
m 
°F 0d?F m—10F 1 a OF 
V-F — = = m—1 : 
©) d, ax;2 ar2 - ror r™—! ar (: or ) 
For the Green’s function, therefore, we get 
d dG 6 
pl Pas (21.19) 
dr dr Qin 
The solution, for m > 3, is (see Problem 21.9) 
T(m/2) 1 
G(r) = 20m — 2 wmT2\ pmm3 form > 3. (21.20) 


We can restore the vector y, at which we placed the origin, by noting that 
r = |r| = |x — y|. Thus, we get 


= T'(m/2) 1 
G(x,y)= 2(m — 2)nm/2 (= _ 1) 
m (m—2)/2 
T(m/2) 
~ 2m — 2)nm/2 bx =i) ‘| form > 3. 
(21.21) 


Similarly, we obtain 


G(x,y) = : 1 ae 1 r —y2)°] form =2 
Y) = 5 nx yl = FI — yi)? + G2 yp)"] form =2. 
™ 4x 193) 
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Having found the Green’s function for the Laplacian, we can find a so- 
lution to the inhomogeneous equation, the Poisson equation, V7u = — p(x). 
Thus, for m > 3, we get 


T(m/2 
ux) =— fa" yGee.y)e) = th [a mye 


2(m — 2)am/2 y|"-2° 


In particular, for m = 3, we obtain 


7 Ix—y| 


which is the electrostatic potential due to a charge density p(y). 


21.3. Formal Development 


The preceding section was devoted to a discussion of the Green’s function 
for the Laplacian with no mention of the BCs. This section will develop a 
formalism that not only works for more general operators, but also incorpo- 
rates the BCs. 


21.3.1 General Properties 


Basic to a study of GFs is Green’s identity, whose |-dimensional version we 
encountered in Chap. 20. Here, we generalize it to m dimensions. Suppose 
there exist two differential operators, Ly and Li, which for any two functions 
u and v, satisfy the following relation:° 


v*Ly{u] — u(Lifv})* =V - Q[u, v*] = = [u, v*]. (21.23) 


i=1 


The differential operator Lt is—as in the one-dimensional case—called the 
formal adjoint of Lx. Integrating (21.23) over a closed domain D in R” with 
boundary 0D, and using the divergence theorem, we obtain 


| d™x{v*Lx{u] — u(Lifv =f Q-é, da, (21.24) 
D 


where €, is an m-dimensional unit vector normal to 0D, and da is an ele- 
ment of “area” of the m-dimensional hypersurface 0 D. Equation (21.24) is 
the generalized Green’s identity for m dimensions. Note that the weight 
function is set equal to one for simplicity. 


3The notions of divergence and divergence theorem require the machinery of differential 
geometry to which we shall come back later. Here, we are simply using a direct and most 
obvious generalization of the notions from three to m dimensions. 


21.3 Formal Development 


The differential operator L, is said to be formally self-adjoint if the RHS 
of Eq. (21.24), the surface term, vanishes. In such a case, we have Ly = Li as 
in one dimension. This relation is a necessary condition for the surface term 
to vanish because u and v are, by assumption, arbitrary. Ly is called self- 
adjoint (or, somewhat imprecisely, hermitian) if Ly = Li and the domains of 
the two operators, as determined by the vanishing of the surface term, are 
identical. 

We can use Eq. (21.24) to study the pair of PDEs 


Lx[u]= f(x) and L'[v] =h(x). (21.25) 


As in one dimension, we let G(x, y) and g(x, y) denote the Green’s func- 
tions for Ly and Lx, respectively. Let us assume that the BCs are such that 
the surface term in Eq. (21.24) vanishes. Then we get Green’s identity 


| d™xv* Ly [ul] =) d™xu(Li[v])’. (21.26) 
D D 

If in this equation we let u = G(x, t) and v = g(x,y), where t,y € D, we 
obtain 


| d’ xg* (x, y)d(x-—t)= ‘, d™ xG(x, )d(x — y), 
D D 


or g*(t, y) = G(y, t). In particular, when Ly is formally self-adjoint, we have 
G*(t, y) = G(y, t), or G(t, y) = G(y, b), if all the coefficient functions of Ly 
are real. That is, the Green’s function will be symmetric. 

If we let v = g(x, y) and use the first equation of (21.25) in (21.26), we 
get u(y) = de d' xg* (x, y) f (x), which, using g*(t, y) = G(y, t) and inter- 
changing x and y, becomes u(x) = te d yG(x, y) f(y). It can similarly be 
shown that v(x) = J, d” yg(x, y)h(y). 


21.3.2 Fundamental (Singular) Solutions 


The inhomogeneous term of the differential equation to which G(x, y) is a 
solution is the delta function, 5(x — y). It would be surprising if G(x, y) did 
not “take notice” of this catastrophic source term and did not adapt itself 
to behave differently at x = y than at any other “ordinary” point. We noted 
the singular behavior of the Green’s function at x = y in one dimension 
when we proved Theorem 20.3.5. There we introduced h(x, y)—which was 
discontinuous at x = y—as a part of the Green’s function. Similarly, when 
we discussed the Green’s functions for the Laplacian in two and m dimen- 
sions earlier in this chapter, we noted that they behaved singularly at r= 0 
or x = y. In this section, we study similar properties of the GFs for other 
differential operators. 

Next to the Laplacian in difficulty is the formally self-adjoint elliptic 
PDO Lx = V* + q(x) discussed in Problem 21.10. Substituting this operator 
in the generalized Green’s identity and using the expression for Q given in 
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Problem 21.10, we obtain 
if d™x{vuLx[u] — u(Lx{v])} = (vé, - Vu — ue, - Vv) da. 
D aD 


Letting v = G(x, y) and denoting é, - V by d/dn gives 


a dG 
| d™x[GLyu — uL,G] -/ cae —u—|da. (21.27) 
D apL on on 


We want to use this equation to find out about the behavior of G(x, y) as 
|x — y| > 0. Therefore, assuming that y € D, we divide the domain D 
into two parts: one part is a region D, bounded by an infinitesimal hyper- 
sphere S. with radius € and center at y; the other is the rest of D. Instead of 
D we use the region D' = D — D,. The following facts are easily deduced 
for D’: 


(1) LyG(x, y) =0 because x 4 y in D’; 
(2) is = lime-+0 iP 
(3) aD’ =ADUS&. 


Suppose that we are interested in finding a solution to 
Lx{u] =[V? +9) Ju) = fx) 


subject to certain, as yet unspecified, BCs. Using the three facts listed above, 
Eq. (21.27) yields 


| d™ x[GLyu — uLyG] 
D 


= lim d™x[G Lyu —uLxG] 
e>0 Jp’ 7 —— 
=f =0 


= lim d™xG(x, y) f(x) = | dx G(x, y) f (x) 
€>0 Jp’ D 


ou dG Ou dG 
= G— —u—  ])d G— —u— }da. 
[,( on we) o+[ ( on “) 


We assume that the BCs are such that the integral over 0D vanishes. This 
is a generalization of the one-dimensional case (recall from Chap. 20 that 
this is a necessary condition for the existence of Green’s functions). More- 
over, for an m-dimensional sphere, da = r™—1dQ,,, which for S< reduces 
to €”—!dQ ». Substituting in the preceding equation yields 


7] dG 
i d™xG(x, y) f(x) = | Go u \e™ 1d Qn. 
D Si on on 
We would like the RHS to be u(y). This will be the case if 


a aG 
lim , Gy)" 'd 2m =0 and lim ue" dQm = uly) 


«> e>0 Sc 


21.3 Formal Development 


for arbitrary u. This will happen only if 
; m—1 _ IG m—1 
lim G(y+r,y)r =0, lim —(y+r,y)r =const. (21.28) 
r—>0 r>0 Or 
A solution to these two equations is 


—*en In(Ix—y|)+ A(x,y) ifm=2, 


1 F(x y) ‘ 
(m—2) 8m |x—y|"~2 +HGy) ifm>3, 


G(x, y)= (21.29) 


where H (x, y) and F(x, y) are well behaved at x = y. The introduction of 
these functions is necessary because Eq. (21.28) determines the behavior 
of G(x, y) only when x ~ y. Such behavior does not uniquely determine 
G(x, y). For instance, elx-yl In(|x — y|) and In(|x — y|) behave in the same 
way as |x — y| > 0. 

Equation (21.29) shows that for Ly = Ve 4 q(x), the Green’s function 
consists of two parts. The first part determines the singular behavior of the 
Green’s function as x — y. The nature of this singularity (how badly the GF 
“blows up” as x > y) is extremely important, because it is a prerequisite for 
our ability to write the solution in terms of an integral representation with the 
Green’s function as its kernel. Due to their importance in such representa- 
tions, the first terms on the RHS of Eq. (21.29) are called the fundamental 
solution of the differential equation, or the singular part of the Green’s 
function. 

What about the second part of the Green’s function? What role does it 
play in obtaining a solution? So far we have been avoiding consideration of 
BCs. Here H (x, y) can help. We choose H (x, y) in such a way that G(x, y) 
satisfies the appropriate BCs. Let us discuss this in greater detail and gener- 
ality. 

If BCs are ignored, the Green’s function for a SOPDO Ly cannot be de- 
termined uniquely. In particular, if G(x, y) is a Green’s function, that is, if 
L,G(x, y) = 5(x — y), then so is G(x, y) + H (x,y) as long as H (x,y) isa 
solution of the homogeneous equation Ly H (x, y) = 0. Thus, we can break 
the Green’s function into two parts: 


G=G;+H, where LyG;(x,y)=d6(x-y), LxyAH (x,y) =0 (21.30) 


with G, the singular part of the Green’s function. H is called the regular 
part of the Green’s function. Neither G; nor H (nor G, therefore) is unique. 
However, the appropriate BCs, which depend on the type of Lx, will deter- 
mine G uniquely. 

To be more specific, let us assume that we want to find a Green’s function 
for Ly that vanishes at the boundary 0D. That is, we wish to find G(x, y) 
such that G(x», y) = 0, where xz is an arbitrary point of the boundary. All 
that is required is to find a Gs; and an H satisfying Eq. (21.30) with the 
BC A (xp, y) = —Gs (Xp, y). The latter problem, involving a homogeneous 
differential equation, can be handled by the methods of Chap. 19. Since any 
discussion of BCs is tied to the type of PDE, we have reserved the discussion 
of such specifics for the next chapter. 
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21.4 Integral Equations and GFs 


Integral equations are best applied in combination with Green’s functions. 
In fact, we can use a Green’s function to turn a DE into an integral equation. 
If this integral equation is compact or has a compact resolvent, then the 
problem lends itself to the methods described in Chaps. 17 and 18. 

Let Ly be a SOPDO in m variables. We are interested in solving the 
SOPDE 


Ly[u] + AV (x)u(x) = f(x) 


subject to some BCs. Here 4 is an arbitrary constant, and V(x) is a well- 
behaved function on R”. Transferring the second term on the LHS to the 
RHS and then treating the RHS as an inhomogeneous term, we can write 
the “solution” to the PDE as 


iH i a" yGox.y)[ fy) — AV Quy], 


where D is the domain of Ly and Go is the Green’s function for Ly with 

some, as yet unspecified, BCs. The function H is a solution to the homoge- 

neous equation, and it is present to guarantee the appropriate BCs. 
Combining the first term in the integral with H (x), we have 


u(x) = Fx) —2 | d™ yGo(x, y)V(y)u(y). (21.31) 


Equation (21.31) is an m-dimensional Fredholm equation whose solution 
can be obtained in the form of a Neumann series. 


Example 21.4.1 Consider the bound-state Schrédinger equation in one di- 
mension: 
? aw 
-—— + V@)VWa)=EV(x), E<0. 
Qu dx? 


We rewrite this equation as 


a Qu 
L.[v]= (S = alae) =F VQ)" @), 
where x* = —2uE/h? > 0. Equation (21.31) gives the equivalent integral 
equation 


Qu 
W(x) = Yow) +E [Gor VOW Ody 
—0o 
where W(x) is the solution of L,[W%o] = 0, which is easily found to be of 
the general form W(x) = Ae“* + Be~**. If we assume that W(x) remains 
finite as x — -koo, W(x) will be zero. Furthermore, it can be shown that 
Go(x, y) = —e~*-! /2« (see Problem 20.12). Therefore, 


Vo)=—ie / _ eth olV nw )dy. 
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Now consider an attractive delta-function potential with center at a: 
V(x) =—Vod(x—a), Vo>O. 


For such a potential, the integral equation yields 


CO 
‘ Vv 
W(x) = a | e195 (y — aW(y) dy = eel w(a). 
RK Jo hk 
For this equation to be consistent, i.e., to get an identity when x = a, we 
must have 
LVo LLVo LLVo There is only one 
he La h2 = i D2 ° nondegenerate 


quantum state for an 
attractive delta function 
potential. 


Therefore, there is only one bound state and one energy level for an attrac- 
tive delta-function potential. 


To find a Neumann-series solution we can substitute the expression for 
u given by the RHS of Eq. (21.31) in the integral of that equation. The 
resulting equation will have two integrals, in the second of which u appears. 
Substituting the new u in the second integral and continuing the process N 
times yields 


N-1 


u(x) = F(x) + 0(-a)" [ d"yK" (x, y)F(y) 
n=l 
4 (cay [ a" yKN (x, y)uly), 
where 


K(x, y) = V(x)Go(x, y), 


(21.32) 
K"(x,y)= | a™tK"'(x,t)K(t,y) forn>2. 
D 
The Neumann series is obtained by letting N — oo: 
CO 
u(x) = F(x) + yi-ay” | d yK"(x,y)F(y). (21.33) 
D 


n=1 


Except for the fact that here the integrations are in m variables, Eq. (21.33) 
is the same as the Neumann series derived in Sect. 18.1. In exact analogy, 
therefore, we abbreviate (21.33) as 


lu) =|F) +) 0(-A)"K"|F). (21.34) 
n=1 


Equations (21.33) and (21.34) have meaning only if the Neumann series 
converges, i.e., if 


1/2 
al | any | a"s|Koy? | <1. (21.35) 
D D 


654 


Feynman's diagrammatic 
representation of GF 


GF as propagator 


21 Multidimensional Green's Functions: Formalism 


We will briefly discuss an intuitive physical interpretation of the Neumann 
series due to Feynman. Although Feynman developed this diagrammatic 
technique for quantum electrodynamics, it has been useful in other areas, 
such as statistical and condensed matter physics. In most cases of interest, 
the SOPDE is homogeneous, so f(x) = 0. In that case, Ly and V(x) are 
called the free operator and the interacting potential, respectively. The so- 
lution to Lx[u] = 0 is called the free solution and denoted by u f (x). 
Let us start with Eq. (21.31) written as 


U(X) =u f(x) =f d™ yGo(x, y)V (y)u(y), (21.36) 


where Go stands for the Green’s function for the free operator Ly. The full 
Green’s function, that is, that for Ly, + AV, will be denoted by G. Moreover, 
as is usually the case, the region D has been taken to be all of R”. This 
implies that no boundary conditions are imposed on uv, which in turn permits 
us to use the singular part of the Green’s function in the integral. Because 
of the importance of the full Green’s function, we are interested in finding a 
series for G in terms of Go, which is supposed to be known. To obtain such 
a series we start with the abstract operator equation and write G = Go + A, 
where A is to be determined. Operating on both sides with L (“inverse” of 
Go), we obtain LG = LGg + LA = 1+ LA. On the other hand, (L+AV)G = 1, 
or LG = 1 — AVG. These two equations give 


LA=-AVG = A=-AL7'!VG=-—AGVG. 


Therefore, 
G= Gp — AGoVG. (21.37) 


Sandwiching both sides between (x| and |z), inserting 1 = [ ly) (y|d” y be- 
tween Go and V and 1 = [ |t)(t|d’"r between V and G, and assuming that V 
is local [i.e., V(y, t) = V(y)d(y — t)], we obtain 


G(x, Z) = Go(x, z) — 1 fa" yGovw y) V(y)G(y, z). (21.38) 


This equation is the analogue of (21.31) and, just like that equation, is 
amenable to a Neumann series expansion. The result is 


G(x, y) = Golx,y) +) \(-a)" a d"2Go(x,2)K"(@,y), (21.39) 


n=1 


where K"(x, Z) is as given in Eq. (21.32). 

Feynman’s idea is to consider G(x, y) as an interacting propagator be- 
tween points x and y and Go(x, y) as a free propagator. The first term on 
the RHS of (21.39) is simply a free propagation from x to y. Diagrammati- 
cally, it is represented by a line joining the points x and y [see Fig. 21.1(a)]. 
The second term is a free propagation from x to y, (also called a vertex), 
interaction at y, with a potential —AV(y,), and subsequent free propaga- 
tion to y [see Fig. 21.1(b)]. According to the third term, the particle or wave 
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y 
y y 
Lo y2 
x X. yy Xx. Yj 


(a) (b) (c) 


Fig. 21.1 Contributions to the full propagator in (a) the zeroth order, (b) the first order, 
and (c) the second order. At each vertex one introduces a factor of —AV and integrates 
over all values of the variable of that vertex 


[represented by u ¢(x)] propagates freely from x to y,, interacts at y, with 
the potential —AV (y,), propagates freely from y, to y>, interacts for a sec- 
ond time with the potential —AV (y,), and finally propagates freely from y> 
to y [Fig. 21.1(c)]. The interpretation of the rest of the series in (21.39) is 
now clear: The nth-order term of the series has n vertices between x and y 
with a factor —AV(y;) and an integration over y, at vertex k. Between any 
two consecutive vertices y, and y,,, there is a factor of the free propagator 
Go(¥es Ye+1)- 

Feynman diagrams are used extensively in relativistic quantum field 
theory, for which m = 4, corresponding to the four-dimensional space— 
time. In this context 4 is determined by the strength of the interaction. 
For quantum electrodynamics, for instance, A is the fine-structure constant, 
e? /Ac = 1/137. 


21.5 Perturbation Theory 


Few operator equations lend themselves to an exact solution, and due to the 
urgency of finding a solution to such equations in fundamental physics, var- 
ious techniques have been developed to approximate solutions to operator 
equations. We have already seen instances of such techniques in, for exam- 
ple, the WKB method. This section is devoted to a systematic development 
of perturbation theory, which is one of the main tools of calculation in quan- 
tum mechanics. For a thorough treatment of perturbation theory along the 
lines presented here, see [Mess 66, pp. 712-720]. 

The starting point is the resolvent (Definition 17.7.1) of a Hamiltonian 
H, which, using z instead of A, we write as R,(H). For simplicity, we as- 
sume that the eigenvalues of H are discrete. This is a valid assumption if the 
Hamiltonian is compact or if we are interested in approximations close to 
one of the discrete eigenvalues. Denoting the eigenvalues of H by {E;}?°o, 
we have 


HP; = E£;P;, (21.40) 
where P; is the projection operator to the ith eigenspace. We can write the 
resolvent in terms of the projection operators by using Eq. (17.6): 


R.(H) =) Rs (21.41) 


oe 
aes 


655 


656 


perturbing potential 


Degeneracy is the 
dimension of the 
eigenspace of the 

Hamiltonian. 


21 Multidimensional Green's Functions: Formalism 


The projection operator P; can be written as a contour integral as in 
Eq. (17.11). Any sum of these operators can also be written as a contour 
integral. For instance, if I is a circle enclosing the first n + 1 eigenvalues, 
then 


n 
1 
pP-=) p,—--__ @R (dz. 21.42 
oa sj PR dz (21.42) 


Multiplying Eq. (21.42) by H and using the definition of the resolvent, one 
can show that 


1 
HP; = -—f zR.(H) dz. (21.43) 
201 r 


When I includes all eigenvalues of H, Pr = 1, and Eq. (21.43) reduces to 
(17.10) with A— T and f(x) > x. 

To proceed, let us assume that H = Hp + AV where Ho is a Hamiltonian 
with known eigenvalues and eigenvectors, and V is a perturbing potential; 
A is a (small) parameter that keeps track of the order of approximation. Let 
us also use the abbreviations 


G(z)=—R,(H) and Go(z) =—R,(Ho). (21.44) 
Then a procedure very similar to that leading to Eq. (21.37) yields 
G(z) = Go(z) + AGo(z)VG(z), (21.45) 


which can be expanded in a Neumann series by iteration: 


G(z) = ) >A" Go(z)[VGo(z)]". (21.46) 
n=0 


Let {E°}, {M°}, and mg, denote, respectively, the eigenvalues of Ho, their 
corresponding eigenspaces, and the latter’s dimensions.* In the context of 
perturbation theory, mz, is called the degeneracy of Foe and EE is called mg- 
fold degenerate, with a similar terminology for the perturbed Hamiltonian. 
We assume that all eigenspaces have finite dimensions. 

It is clear that eigenvalues and eigenspaces of H will tend to those of Ho 
when A — 0. So, let us collect all eigenspaces of H that tend to me? and 
denote them by {M?}"* |. Similarly, we use E% and P¢ to denote, respec- 
tively, the energy eigenvalue and the projector to the eigenspace M¥. Since 
dimension is a discrete quantity, it cannot depend on A, and we have 


Ta 
> dim M4 = dim M®? = my. (21.47) 
i=l 


4We use the beginning letters of the Latin alphabet for the unperturbed Hamiltonian. 
Furthermore, we attach a superscript “O” to emphasize that the object belongs to Ho. 
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We also use the notation P for the projector onto the direct sum of M#’s. We 
thus have 
Ta 
= : 0 
oe and lim P= Py, (21.48) 
i= 


where we have used an obvious notation for the projection operator onto 
02, 

The main task of perturbation theory is to find the eigenvalues and eigen- 
vectors of the perturbed Hamiltonian in terms of a series in powers of A 
of the corresponding unperturbed quantities. Since the eigenvectors—or, 
more appropriately, the projectors onto eigenspaces—and their correspond- 
ing eigenvalues of the perturbed Hamiltonian are related via Eq. (21.40), 
this task reduces to writing P as a series in powers of A whose coefficients 
are operators expressible in terms of unperturbed quantities. 

For sufficiently small 2, there exists a contour in the z-plane enclosing 
E° and all E#’s but excluding all other eigenvalues of H and Ho. Denote 
this contour by I, and, using Eq. (21.42), write 


1 
P= aif G(z) dz. 
201 Fy 


It follows from Eq. (21.46) that 


CO 
P= p? + > A”, where 
n=l 
1 
AM =f Goi[VGr(a)]" de. (21.49) 
2n1 r 


a 


This equation shows that perturbation expansion is reduced to the calcula- 
tion of A”, which is simply the residue of Go(z)[WGo(z)]”. The only singu- 
larity of the integrand in Eq. (21.49) comes from Go(z), which, by (21.44) 
and (21.41), has a pole at Ee So, to calculate this residue, we simply expand 
Go(z) in a Laurent series about She 


y— 
Go(z) = 0 
D z—E, 
(0) po 
a b 
=— at 
z—E; pga ~~ Eb 
a P? 
pee BO ,— FO 
2 Fa pga (ER ED + an) 


p? oo (z- E°)kpo 
a ot os iy Tere 
z—-E (E9 — Eye 
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Switching the order of the two sums, and noting that our space is the Hilbert 
space of Ho whose basis can be chosen to consist of eigenstates of Ho, we 
can write Ho instead of E) in the denominator to obtain 


ee 
aa (£2 _ Ep - (Eo = Ho)*+! 
=Q? 
ces 
= ee PS _ 1—- po 
~ (E2—Ho)k+! (£2 — Ho) K+! 
_ Q' = G'+1(£9)Q9 = Q°Gk+!(£° 
= (£9 — H)k+! ( a) Qi =Q, ( @)Q), 
a 


where we have used the completeness relation for the P?’s, the fact that Q° 
commutes with Ho [and, therefore, with Git! (E°)], and, in the last equality, 
the fact that Q° is a projection operator. It follows that 


po ~ 
Go(<) = Fy + DDE (@ = Ep) 2G)" (Bg) Q 
a K=0 


2 —1)*(z — E°)* "sé, (21.50) 


where we have introduced the notation 


st p? if k =0, 
PGK (EQ? ifk>1. 


By substituting Eq. (21.50) in Go(z)[WGo(z)]” we obtain a Laurent ex- 
pansion whose coefficient of (z — E%)"! is A”, The reader may check that 
such a procedure yields 


A”) —(-1)"t! So stvsev...vsi!, (21.51) 
(n) 


n+1 


where by definition, > °, p) extends over all nonnegative integers {k;};") 


such that 


n+1 


oki =p Vp=>0. 
i=1 


Note that although Go(z) has a pole at E°, the expressions in the last line of the equation 
above make sense because Q? annihilates all states with eigenvalue E°. The reason for the 
introduction of Q? on both sides is to ensure that Gi"! (E®) will not act on an eigenstate 
of E® on either side. 


21.5 Perturbation Theory 


It turns out that for perturbation expansion, not only do we need the expan- 
sion of P [Eqs. (21.49) and (21.51)], but also an expansion for HP. Using 
Eqs. (21.43) and (21.44), with I replaced by I'y, we have 


HP = af zG(z) dz — af (z = Eo + E2)G(z) dz 
2ni Sr, 2ni Jr, saree 


(z — E2)G(z) dz + EQP. 


a 


~ Oni Sr 
Substituting for G(z) from Eq. (21.46), we can rewrite this equation as 


[oe 


(H— E°)P= > rn B™, (21.52) 
n=1 
where 
B™ =(-1)"! SO shvsey... vst, (21.53) 


(n—1) 

Equations (21.52) and (21.53) can be used to approximate the eigenvec- 
tors and eigenvalues of the perturbed Hamiltonian in terms of those of the 
unperturbed Hamiltonian. It is convenient to consider two cases: the nonde- 
generate case in which m, = 1, and the degenerate case in which mg > 2. 


21.5.1 The Nondegenerate Case 


In the nondegenerate case, we let i denote the original unperturbed eigen- 
state, and use Eq. (21.47) to conclude that the perturbed eigenstate is also 
one-dimensional. In fact, it follows from (21.40) that P|°) is the desired 
eigenstate. Denoting the latter by |y) and using Eq. (21.49), we have 


[o,@) [o,@) 
Iv) =P[c)= Pala) + DAM |) = la) Do A"AM |) 21.54) 


n=1 n=1 


because p? is the projection operator onto I°). 

More desirable is the energy of the perturbed state E,, which obeys the 
relation HP = E,,P. Taking the trace of this relation and noting that trP = 
trP? = 1, we obtain 


[o,e) 
Eq = tr(HP) = o( ese +> wa) 


n=1 
CO CO 
=El+ S04" trB” = El +S 0a" en, (21.55) 
n=1 =En n=l 


where we used Eq. (21.52). Since A is simply a parameter to keep track 
of the order of perturbation, one usually includes it in the definition of the 
perturbing potential V. The mth-order correction to the energy is then written 


659 


660 


first-order correction to 
energy 


second-order correction 
to energy 


21 Multidimensional Green's Functions: Formalism 


as 
en =trB™, (21.56) 


Since each term of B” contains p? at least once, and since 
tr(UP°T) = tr(TUP?) 


for any pair of operators U and T (or products thereof), one can cast &, 
into the form of an expectation value of some product of operators in the 
unperturbed state eae For example, 


e=trB) = 5 ‘2 \Povp2)") = (|v?) (21.57) 
b 


because Po|?) = 0 unless b =a. This is the familiar expression for the first 
order correction to the energy in nondegenerate perturbation theory. Simi- 
larly, 


e2 = trB™ = — tr(P?vP®v[—Q°Gi(E°)Q°] 
+ PLV[—Q5G5 (Ea) Qi ]VP, + [A765 (£2) Qo ]VP; VP.) 
= (;|VQiG) (£2) Q0V|)). 


The first and the last terms in parentheses give zero because in the trace sum, 
p? gives a nonzero contribution only if the state is ee which is precisely 
the state annihilated by a. Using the completeness relation >>, aera = 
1=). 1°) (| for the eigenstates of the unperturbed Hamiltonian, we can 
rewrite €2 as 


Oifb=a Oifc=a 
Oly] (10 Gk) G]% lylo (oii?) |? 
c= oC Iwi) Glee «b(2) 029) Ciwie)= yo Mee 
b,c — ep Es E, 


8pc/(E2—E?) 


This is the familiar expression for the second-order correction to the en- 
ergy in nondegenerate perturbation theory. 


21.5.2 The Degenerate Case 


The degenerate case can also start with Eqs. (21.54) and (21.55). The differ- 
ence is that ¢, cannot be determined as conveniently as the nondegenerate 
case. For example, the expression for ¢; will involve a sum over a basis of 
mM? because po) is no longer just ee but some general vector in m?. In- 
stead of pursuing this line of approach, we present a more common method, 
which concentrates on the way mM? and the corresponding eigenspaces of the 
perturbed Hamiltonian, denoted by M,, enter in the calculation of eigenval- 
ues and eigenvectors. 


21.6 Problems 


The projector p? acts as a unit operator when restricted to Me. In partic- 
ular, it is invertible. In the limit of small A, the projection operator P is close 
to p?; therefore, it too must be invertible, i.e., P : Me — M, is an isomor- 
phism. Similarly, p? >Mag => mM? is also an isomorphism—not necessarily 
the inverse of the first one. It follows that for each vector in me? there is a 
unique vector in MM, and vice versa. 

The eigenvalue equation H|E,) = Eg|E,) can thus be written as 


HP, | Eo) = EqPa|E°), 


where |E°) is the unique vector mapped onto |EZ,) by P,. Multiplying both 
sides by p?, we obtain 


PGHPa| E,) = EaPoPa| Ez); 


which is completely equivalent to the previous equation because p? is in- 
vertible. If we define 


H, =P°HP,P°: 0° > 2, = KK, = POP, P?: M2 > M?, (21.58) 
the preceding equation becomes 
H,| £2) = EK,| £2). (21.59) 


As operators on Me? both H, and K, are hermitian. In fact, K,, which can be 
written as the product of p? P,, and its hermitian conjugate, is a positive def- 
inite operator. Equation (21.59) is a generalized eigenvalue equation whose 
eigenvalues E, are solutions of the equation 


det(H, — xK,) = 0. (21.60) 


The eigenvectors of this equation, once projected onto M, by Pz, give the 
desired eigenvectors of H. 

The expansions of Hy and Ky are readily obtained from those of HP, and 
P, as given in Eqs. (21.49) and (21.52). We give the first few terms of each 
expansion: 


Ky = Py — 2°PLVQUG5(E7)Q,VPy +: , (21.61) 
Ha = EK, + APOVPS + A7PPVQPGo(E2)Q°VPo +... 
To any given order of approximation, the eigenvalues E, are obtained by 


terminating the series in (21.61) at that order, plugging the resulting finite 
sum in Eq. (21.60), and solving the determinant equation. 


21.6 Problems 


21.1 Show that the definitions of the three types of SOPDEs discussed in 
Example 21.1.6 are equivalent to the definitions based on Eq. (21.5). Hint: 
Diagonalize the matrix of coefficients of the SOPDE: 


a7u a a7u i a7u in r( du du i 
xX,Y,U, —, > =v, 
"9x2 axdy “ay? ae dy 
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where a, b, and c are functions of x and y. Write the eigenvalues as (a + 
c + A)/2 and consider the three cases |A| < |Ja+cl, |A| > |a +c], and 
|A| =|a+cl. 


21.2 Find the characteristic curves for L,[u] = du/dx. 


21.3 Find the characteristic curves for the two-dimensional wave equation 
and the two-dimensional diffusion equation. 


21.4 Solve the Cauchy problem for the two-dimensional Laplace equation 
subject to the Cauchy data u(0, y) = 0, (du/dx)(0, y) = € sinky, where € 
and k are constants. Show that the solution does not vary continuously as the 
Cauchy data vary. In particular, show that for any « 4 0 and any preassigned 
x > 0, the solution u(x, y) can be made arbitrarily large by choosing k large 
enough. 


21.5 Show that the x; in Eq. (21.12) describe an m-dimensional sphere of 
radius r, that is, )77".) x? =r?. 


21.6 Use J5(x — a) = 6(€ — @) and the coordinate transformation from 
the spherical coordinate system to Cartesian coordinates to express the 3D 
Cartesian delta function in terms of the corresponding spherical delta func- 
tion at a point P = (x0, yo, Zo) = (’0, 90, Yo) Where the Jacobian J is non- 
vanishing. 


21.7 Find the volume of an m-dimensional sphere. 


21.8 Prove Eq. (21.11). First, note that the RHS of Eq. (21.10) is a function 
of only k of the w’s. This means that 


H€)\g=a = H(ay,..., Og). 


(a) Rewrite Eq. (21.10) by separating the integral into two parts, one in- 
volving {E}E_ , and the other involving {&;}?"_, 4 ,- Compare the RHS 
with the LHS and show that 


k 
[iiss «+ dim8(x— a) =] [8G — a4). 


i=1 
(b) Show that this equation implies that 5(x — a) is independent of 
(Ei }L, 41: Thus, one can take the delta function out of the integral. 
21.9 Find the m-dimensional Green’s function for the Laplacian as fol- 


lows. 


(a) Solve Eq. (21.19) assuming that r 4 0 and demanding that G(r) > 0 
as r — oo (this can be done only for m > 3). 


21.6 Problems 


(b) Use the divergence theorem in m dimensions and (21.18) to show that 


dG 
i] —da=1, 
s dr 


where S is a spherical hypersurface of radius r. Now use this and the 
result of part (a) to find the remaining constant of integration. 


21.10 Consider the operator Ly = V2 +b-V +c for which {bile and c 
are functions of {x;}/"_|. 
(a) Show that Liv] = V*v—V- (bv) + cv, and 

Qiu, v*] = Q[u, v] = vVu —uVu+ buv. 


(b) Show that a necessary condition for Ly to be self-adjoint is 2b - Vu + 
u(V -b) = 0 for arbitrary uw. 

(c) By choosing some w’s judiciously, show that (b) implies that b; = 0. 
Conclude that Ly = V? + c(x) is formally self-adjoint. 


21.11 Solve the integral form of the Schrédinger equation for an attractive 
double delta-function potential 


V(x) =—Vo[S(@@ — a1) + 6(x — a2)], Vo > 0. 


Find the eigenfunctions and obtain a transcendental equation for the eigen- 
values (see Example 21.4.1). 


21.12 Show that the integral equation associated with the damped har- 


monic oscillator DE ¥ + 2yx + Wex = 0, having the BCs x(0) = xo, 
(dx /dt);=o9 = 0, can be written in either of the following forms. 


oH ’ —2y(t-t’) ! ! 
(a) x= - 32 | [re }e(") ar’. 


2yx 


t 
re sin wot — ay | cos|wo(t — t’)]x(t’) dt’. 
0 0 


(b) x(t) =xpcos@ot + 


Hint: Take wx or 2yx, respectively, as the inhomogeneous term. 


21.13 Show that for scattering problems (E > 0) 
(a) the integral form of the Schrddinger equation in one dimension is 
vos sels — TEL Gitte oiy yy (yya 
=f a yey) ay. 
(b) Divide (—oo, +00) into three regions Rj = (—0oo, —a), Ro = (—a, +a) 
and R3 = (a, 00). Let W(x) be w(x) in region R;. Assume that the 
potential V(x) vanishes in R; and R3. Show that 


oa | ee eo 
Wie) = elt — OE eth / eV (y)¥a(y) dy, 
—a 
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W(x) = el -af fev a(n) dy, 
s(x) = eth — 20 ik ‘f eV (yoy) dy. 


This shows that determining the wave function in regions where there 
is no potential requires the wave function in the region where the po- 
tential acts. 

(c) Let 


ven fo" if |x| <a, 


0 if|x|>a, 


and find y2(x) by the method of successive approximations. Show 
that the nth term is less than (2) Voa/h7k)"—! (so the Neumann series 
will converge) if (2Voa/hv) < 1, where v is the velocity and wu = 
hk is the momentum of the wave. Therefore, for large velocities, the 
Neumann series expansion is valid. 


21.14 (a) Show that HR, (H) = 1+ zR,(H). (b) Use (a) to prove Eq. (21.43). 


Multidimensional Green’s Functions: 22 
Applications 


The previous chapter gathered together some general properties of the GFs 
and their companion, the Dirac delta function. This chapter considers the 
Green’s functions for elliptic, parabolic, and hyperbolic equations that sat- 
isfy the BCs appropriate for each type of PDE. 


22.1 ‘Elliptic Equations 


The most general linear PDE in m variables of the elliptic type was dis- 
cussed in Sect. 21.1.2. We will not discuss this general case, because all 
elliptic PDOs encountered in mathematical physics are of a much simpler 
nature. In fact, the self-adjoint elliptic PDO of the form Ly = V* + q(x) is 
sufficiently general for purposes of this discussion. Recall from Sect. 21.1.2 
that the BCs associated with an elliptic PDE are of two types, Dirichlet and 
Neumann. Let us consider these separately. 


22.1.1 The Dirichlet Boundary Value Problem 


A Dirichlet BVP consists of an elliptic PDE together with a Dirichlet BC, 
such as 


Ly [u] = V7u+q(x)u= f(x) forxe D, 
(22.1) 
u(xp) = g(xp) forx, € dD, 


where g(xp) is a given function defined on the closed hypersurface 0D. 
The Green’s function for the Dirichlet BVP must satisfy the homoge- 

neous BC, for the same reason as in the one-dimensional Green’s function. 

Thus, the Dirichlet Green’s function, denoted by G p(x, y), must satisfy 


Lx[Gpo(x, y)]=5(x—-y), Gp(xp, y) =0forx€ S. 


As discussed in Sect. 21.3.2, we can separate Gp into a singular part Gs 
and a regular part H where Gs) satisfies the same DE as Gp and H satisfies 
the corresponding homogeneous DE and the BC H (xp, y) = Ge (Xp, y). 
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Using Eq. (22.1) and the properties of G p(x, y) in Eq. (21.27), we obtain 


dG 
u=| arycranfo+ fea TPey da, 022) 
D aD Ny 


where 0/dny indicates normal differentiation with respect to the second ar- 
gument. 


Historical Notes 

Gustav Peter Lejeune Dirichlet (1805-1859), the son of a postmaster, first attended 
public school, then a private school that emphasized Latin. He was precociously interested 
in mathematics; it is said that before the age of twelve he used his pocket money to buy 
mathematical books. In 1817 he entered the gymnasium in Bonn. He is reported to have 
been an unusually attentive and well-behaved pupil who was particularly interested in 
modern history as well as in mathematics. 

After two years in Bonn, Dirichlet was sent to a Jesuit college in Cologne that his parents 
preferred. Among his teachers was the physicist Georg Simon Ohm, who gave him a 
thorough grounding in theoretical physics. Dirichlet completed his Abitur examination 
at the very early age of sixteen. His parents wanted him to study law, but mathematics 
was already his chosen field. At the time the level of pure mathematics in the German 
universities was at a low ebb: Except for the formidable Carl Gauss, in Gottingen, there 
were no outstanding mathematicians, while in Paris the firmament was studded by such 
luminaries as P.-S. Laplace, Adrien Legendre, Joseph Fourier, and Siméon Poisson. 
Dirichlet arrived in Paris in May 1822. In the summer of 1823 he was fortunate in be- 
ing appointed to a well-paid and pleasant position as tutor to the children of General 
Maximilien Fay, a national hero of the Napoleonic wars and then the liberal leader of the 
opposition in the Chamber of Deputies. Dirichlet was treated as a member of the family 
and met many of the most prominent figures in French intellectual life. Among the math- 
ematicians, he was particularly attracted to Fourier, whose ideas had a strong influence 
upon his later works on trigonometric series and mathematical physics. 

General Fay died in November 1825, and the next year Dirichlet decided to return to 
Germany, a plan strongly supported by Alexander von Humboldt, who was working for 
the strengthening of the natural sciences in Germany. Dirichlet was permitted to qualify 
for habilitation as Privatdozent at the University of Breslau; since he did not have the 
required doctorate, this was awarded honoris causa by the University of Cologne. His ha- 
bilitation thesis dealt with polynomials whose prime divisors belong to special arithmetic 
series. A second paper from this period was inspired by Gauss’s announcements on the 
biquadratic law of reciprocity. 

Dirichlet was appointed extraordinary professor in Breslau, but the conditions for sci- 
entific work were not inspiring. In 1828 he moved to Berlin, again with the assistance 
of Humboldt, to become a teacher of mathematics at the military academy. Shortly after- 
ward, at the age of twenty-three, he was appointed extraordinary (later ordinary) professor 
at the University of Berlin. In 1831 he became a member of the Berlin Academy of Sci- 
ences, and in the same year he married Rebecca Mendelssohn-Bartholdy, sister of Felix 
Mendelssohn, the composer. 

Dirichlet spent twenty-seven years as a professor in Berlin and exerted a strong influ- 
ence on the development of German mathematics through his lectures, through his many 
pupils, and through a series of scientific papers of the highest quality that he published 
during this period. He was an excellent teacher, always expressing himself with great 
clarity. His manner was modest; in his later years he was shy and at times reserved. He 
seldom spoke at meetings and was reluctant to make public appearances. In many ways 
he was a direct contrast to his lifelong friend, the mathematician Karl Gustav Jacobi. 
One of Dirichlet’s most important papers, published in 1850, deals with the boundary 
value problem, now known as Dirichlet’s boundary value problem, in which one wishes 
to determine a potential function satisfying Laplace’s equation and having prescribed 
values on a given surface, in Dirichlet’s case a sphere. 

In 1855, when Gauss died, the University of Géttingen was anxious to seek a successor 
of great distinction, and the choice fell upon Dirichlet. Dirichlet moved to Gottingen in 
the fall of 1855, bought a house with a garden, and seemed to enjoy the quieter life of 
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a prominent university in a small city. He had a number of excellent pupils and relished 
the increased leisure for research. His work in this period was centered on general prob- 
lems of mechanics. This new life, however, was not to last long. In the summer of 1858 
Dirichlet traveled to a meeting in Montreux, Switzerland, to deliver a memorial speech in 
honor of Gauss. While there, he suffered a heart attack and was barely able to return to 
his family in Gottingen. During his illness his wife died of a stroke, and Dirichlet himself 
died the following spring. 


Some special cases of (22.2) are worthy of mention. 


1. The first is u(xp) = 0, the solution to an inhomogeneous DE satisfying 
the homogeneous BC. We obtain this by substituting zero for g(xp) in 
(22.2) so that only the integration over D remains. 

2. The second special case is when the DE is homogeneous, that is, when 
J (x) = 0 but the BC is inhomogeneous. This yields an integration over 
the boundary 0D alone. 

3. Finally, the solution to the homogeneous DE with the homogeneous BC 
is simply u = 0, referred to as the zero solution. This is consistent with 
physical intuition: If the function is zero on the boundary and there is 
no source f(x) to produce any “disturbance,” we expect no nontrivial 
solution. 


Example 22.1.1 (Method of Images and Dirichlet BVP) Let us find the 
Green’s function for the three-dimensional Laplacian Ly = V* satisfying 
the Dirichlet BC Gp(p, y) = 0 for p, on the xy-plane. Here D is the upper 
half-space (z > 0) and 0D is the xy-plane. 

It is more convenient to use r = (x, y, z) andr’ = (x’, y’, z’) instead of x 
and y, respectively. Using (21.21) as Gs) , we can write 


1 
4r|r —r’| 
1 1 
ee Gay a Ga) 
+ leat yiax yy’, z’). 


Gp(r.r’) = + H(r,r’) 


The requirement that Gp vanish on the xy-plane gives 


1 1 
A(x, ¥,0;2).y2) = ; 
4m /(x — x!) + (y— yy’)? +2? 


This fixes the dependence of # on all variables except z. On the other hand, 
V*H =0 in D implies that the form of H must be the same as that of Go 
because except at r =r’, the latter does satisfy Laplace’s equation. Thus, 
because of the symmetry of Gs inrandr’ [Gp(r,r’) = Gp(r’,r)] and the 
evenness of the Laplacian in z (as well as x and y), we have two choices for 
the z-dependence: (z — z’)? and (z + z’)*. The first gives Gp = 0, which is 
a trivial solution. Thus, we must choose 


1 1 
H(x, y, 2; xy 2 )= : 
ai eae ye ayy are y 
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Note that with r’ = (x’, y’, —z’), this equation satisfies V7H = —d(r—r”), 
and it may appear that H does not satisfy the homogeneous DE, as it should. 
However, r” is outside D, andr r” as long as r€ D. So H does satisfy 
the homogeneous DE in D. The Green’s function for the given Dirichlet BC 
is therefore 


<.. tf A 1 
os aay oe era eT 


where r” is the reflection of r’ in the xy-plane. 

This result has a direct physical interpretation. If determining the solu- 
tion of the Laplace equation is considered a problem in electrostatics, then 
Gar, r’) is simply the potential at r of a unit point charge located at r’, 
and Gp(r, r’) is the potential of two point charges of opposite signs, one at 
r’ and the other at the mirror image of r’. The fact that the two charges are 
equidistant from the x y-plane ensures the vanishing of the potential in that 
plane. The introduction of image charges to ensure the vanishing of Gp at 
dD is common in electrostatics and is known as the method of images. This 
method reduces the Dirichlet problem for the Laplacian to finding appropri- 
ate point charges outside D that guarantee the vanishing of the potential on 
dD. For simple geometries, such as the one discussed in this example, de- 
termination of the magnitudes and locations of such image charges is easy, 
rendering the method extremely useful. 

Having found the Green’s function, we can pose the general Dirichlet 
BVP: 


V-u=—p(r) and u(x,y,0)=g(x,y), forz>0. 


The solution is 


_ dx’ dy’ dz oly’ 
w= af a [of “eT a) 
+f ax’ [ dy'g(x'y) > 
—0oo —0o z 


where r= (x,y,z), =(x’, y', 2), andr” = (x’, y’, —2’). 

A typical application consists in introducing a number of charges in the 
vicinity of an infinite conducting sheet, which is held at a constant poten- 
tial Vo. If there are N charges, Cane located at cee then p(r) = 


, (22.3) 
z=0 


sj gio(r — rj), g(x, y) =const = Vo, and we get 
N 


1 qi qi ie fo ,OGp 
= Vi d dy —— 
oe ae teers a)+ Clog da” We 


i=l] 


0 

(22.4) 
where r; = (xj, yj, Z;) and r, = (xj, yi, —Z). That the double integral in 
Eq. (22.4) is unity can be seen by direct integration or by noting that the 
sum vanishes when z = 0. On the other hand, u(x, y,0) = Vo. Thus, the 
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solution becomes 


N 
1 qi qi 
= Vo. 
we Sara r—-ri)* 


i=1 


Example 22.1.2 (Dirichlet BVP for a Sphere) The method of images is also 
applicable when the boundary is a sphere. Inside a sphere of radius a with 
center at the origin, we wish to solve this Dirichlet BVP: 


V-u=—p(r,6,y) forr<a, and u(a,6,~)=g(6,¢). 
The GF satisfies 


V’Go(r,0,9;7',0,9') =8(r—r’)  forr <a, 


° (22.5) 
Gp(a,6,9;r',6’, 9’) =0. 
Thus, Gp can again be interpreted as the potential of point charges, of which 
one is in the sphere and the others are outside. 
We write Gp = Gs) + H and choose H in such a way that the sec- 
ond equation in (22.5) is satisfied. As in the case of the xy-plane, let! 
H(r,r”’) = eae where k is a constant to be determined. If r” is out- 


side the sphere, V7 H will vanish everywhere inside the sphere. The problem 
has been reduced to finding k and r” (the location of the image charge). We 
want to choose r” such that 


1 
Ir—r'| 


k 


Ir—r”| = k(|r—r'|) =(\r—r"|),_.- 


r=a 


r=a r=a 


This shows that k must be positive. Squaring both sides and expanding the 
result yields 


2 _ dar" cosy, 


kK (a? +r” —2ar’ cos y\= a+r 
where y is the angle between r and r’, and we have assumed that r’ and 
r” are in the same direction. If this equation is to hold for arbitrary y, we 
must have k*r! = r” and k?(a? +r”) = a? +r”. Combining these two 
equations yields k*r’* — k?(a* +r”) + a* = 0, whose positive solutions are 
k =1andk =a/r. The first choice implies that r” = r’, which is impossible 
because r” must be outside the sphere. We thus choose k = a/r’, which 
gives r” = (a?/r’”)r’. We then have 


1 1 ar’ 
\ 
Gp(r.r) = |—= aa | (22.6) 


' Actually, to be general, we must add an arbitrary function f(r”) to this. However, as the 
reader can easily verify, the following argument will show that f(r”) = 0. Besides, we 
are only interested in a solution, not the most general one. All simplifying assumptions 
that follow are made for the same reason. 
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Substituting this in Eq. (22.2), and noting that 0G/dny = (0G/dr'),<a, 
yields 


1 a 8 
u(r) = — / r?dr’ i sin 6’ d6’ 
An 0 0 
20 / 
1 ar ; 
d if 
xf (ea na JOC) 


ged 20 
rcs “| ay’ | sind’ gO ee. (22.7) 


where a = (a, 6’, g’) is a vector from the origin to a point on the sphere. For 
the Laplace equation p(r’) = 0, and only the double integral in Eq. (22.7) 
will contribute. 

It can be shown that if g(6’, v’) = const = Vo, then u(r) = Vo. This is 
the familiar fact shown in electromagnetism: If the potential on a sphere is 
kept constant, the potential inside the sphere will be constant and equal to 
the potential at the surface. 


Example 22.1.3 (Dirichlet BVP for a Circle) In this example we find the 
Dirichlet GF for a circle of radius a centered at the origin. The GF is log- 
arithmic [see Eq. (21.22)]. Therefore, H is also logarithmic, and its most 
general form is 


Dirichlet BVP for a circle 


Jt 1 tia 1 vt 1 i i 
aie’) =~} in(fe—e) — Lancy] =— fn). 
so that 

1 / 1 dé 
Gp(r,r) = In(|r — r'|) = In(\r — "| f(r”) 
1 r-—r 
= In 7 
2n | rr") f(r”) 


For Gp to vanish at all points on the circle, we must have 


a-r’ 


(a—r’) f(r’) =l = ja—r’|=|(r—r") f(r’) 


’ 


where a is a vector from origin to a point on the circle. Assuming that r’ 
and r’ are in the same direction, squaring both sides of the last equation and 
expanding the result, we obtain 


(a* +r’? —2ar" cosy) f?(r") =a* +r? — 2ar' cosy, 


where y is the angle between a and r’ (or r’). This equation must hold for 
arbitrary y. Hence, we have f7(r”)r” =r’ and f7(r")(a2 +r”) =a?+r”. 
These can be solved for f(r”) and r”. The result is 
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Substituting these formulas in the expression for Gp, we obtain 


az 
r r 
r 


Gp(r,r’) = - In(|r — r’|) in 


To write the solution to the Dirichlet BVP, we also need 0Gp/dn = 
dG p/dr’. Using polar coordinates, we express Gp as 


1 


‘ r? +r? — 2rr’ cos(@ — 6’) 
n 
4n 


rr’? /a? + a* — 2rr’ cos(6 — 6’) 


Gp(r.r’) = 


Differentiation with respect to r’ yields 


1 ar 


dIGp 
on 


_ 0Gp 
ar! 


ear faq  2ar? +a —2racos(6 — 6)’ 


from which we can immediately write the solution to the two-dimensional 
Dirichlet BVP V2u = p, u(r =a) = g(0’) as 


20 a 
u(r) = ao! | r’Gp(r.r’) p(t’) dr’ 
0 0 
gt—p2 2m 


= ; ! g(6') 
Qna Jo r2 + a2 — 2racos(@ — 0’) 


In particular, for Laplace’s equation p(r’) = 0, and we get 


a = r2 20 ; g(0’) 
ur,6)=-——- | 0 (22.8) 
2ra Jo r2 + a” — 2racos(6 — 6’) 


Poisson integral formula 


Equation (22.8) is called the Poisson integral formula. 


22.1.2 The Neumann Boundary Value Problem 


The Neumann BVP is not as simple as the Dirichlet BVP because it requires 
the normal derivative of the solution. But the normal derivative is related to 
the Laplacian through the divergence theorem. Thus, the BC and the DE are 
tied together, and unless we impose some solvability conditions, we may 
have no solution at all. These points are illustrated clearly if we consider the 
Laplacian operator. 


Historical Notes 

Carl Gottfried Neumann (1832-1925) was the son of Franz Ernst Neumann, a profes- 
sor of physics and mineralogy at K6nigsberg; his mother, Luise Florentine Hagen, was a 
sister-in-law of the astronomer Bessel. Neumann received his primary and secondary ed- 
ucation in KGnigsberg, attended the university, and formed particularly close friendships 
with the analyst FJ. Richelot and the geometer L.O. Hesse. After passing the examination 
for secondary-school teaching, he obtained his doctorate in 1855; in 1858 he qualified for 
lecturing in mathematics at Halle, where he became Privatdozent and, in 1863, assistant 
professor. In the latter year he was called to Basel, and in 1865 to Tiibingen. From the 
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autumn of 1868 until his retirement in 1911 he was at the University of Leipzig. In 1864 
he married Hermine Mathilde Elise Kloss; she died in 1875. 

Neumann, who led a quiet life, was a successful university teacher and a productive re- 
searcher. More than two generations of future gymnasium teachers received their basic 
mathematical education from him. As a researcher he was especially prominent in the 
field of potential theory. His investigations into boundary value problems resulted in pio- 
neering achievements; in 1870 he began to develop the method of the arithmetical mean 
for their solution. He also coined the term “logarithmic potential.” The second boundary 
value problem of potential theory still bears his name; a generalization of it was later 
provided by H. Poincaré. 

Neumann was a member of the Berlin Academy and the Societies of Géttingen, Mu- 
nich, and Leipzig. He performed a valuable service in founding and editing the important 
German mathematics periodical Mathematische Annalen. 


Consider the Neumann BVP 
2 Ou 
Vru= f(x) forxe D, and re = g(x) forxedD. 
n 


Integrating the first equation over D and using the divergence theorem, we 
obtain 


[ fooane= [ v-wane= | é,Yuda= | Lae 
D D aD ap On 


It follows that we cannot arbitrarily assign values of du/dn on the boundary. 
In particular, if the BC is homogeneous, as in the case of Green’s functions, 
the RHS is zero, and we must have i p J (x) dx = 0. This relation is a 
restriction on the DE, and is a solvability condition, as mentioned above. To 
satisfy this condition, it is necessary to subtract from the inhomogeneous 
term its average value over the region D. Thus, if Vp is the volume of the 
region D, then 


V-u = f(x) —f where j=7 | f(x)d™x 
DJD 


ensures that the Neumann BVP is solvable. In particular, the inhomogeneous 
term for the Green’s function is not simply 6(x — y) but 6(x — y) — 46, where 


5 ! 5( )d™ : ifye D 
=— x-—y xXx=— iy : 
Vp Jp Vp 


Thus, the Green’s function for the Neumann BVP, Gy (x, y), satisfies 


1 
V’Gn(x, y) =8(xk-y) - —, 
Vp 


dG 
mame y)=0 forxedaD. 
on 


Applying Green’s identity, Eq. (21.27), we get 


. 
uex)= [ a"yGvaysey— f Gy%y)— da +i (22.9) 
D aD n 


22.2 Parabolic Equations 


where u = (1/Vp) is u(x)d™x is the average value of u in D. Equa- 
tion (22.9) is valid only for the Laplacian operator, although a similar result 
can be obtained for a general self-adjoint SOLPDO with constant coeffi- 
cients. We will not pursue that result, however, since it is of little practical 
use. 

Throughout the discussion so far we have assumed that D is bounded; 
that is, we have considered points inside D with BCs on the boundary 0D 
specified. This is called an interior BVP. In many physical situations we are 
interested in points outside D. We are then dealing with an exterior BVP. 
In dealing with such a problem, we must specify the behavior of the Green’s 
function at infinity. In most cases, the physics of the problem dictates such 
behavior. For instance, for the case of an exterior Dirichlet BVP, where 


dG 
ue)=f aryGoayfon+ | uty)? (x. yp) da 
D aD ny 


and it is desired that u(x) — 0 as |x| — oo, the vanishing of Gp(x, y) at in- 
finity guarantees that the second integral vanishes, as long as 0D is a finite 
hypersurface. To guarantee the disappearance of the first integral, we must 
demand that G p(x, y) tend to zero faster than f (y)d” y tends to infinity. For 
most cases of physical interest, the calculation of the exterior Green’s func- 
tions is not conceptually different from that of the interior ones. However, 
the algebra may be more involved. 

Later we will develop general methods for finding the Green’s functions 
for certain partial differential operators that satisfy appropriate BCs. At this 
point, let us simply mention what are called mixed BCs for elliptic PDEs. 
A general mixed BC is of the form 


0 
ar(x)u(X) + B(x) — (x) = y (x). (22.10) 


Problem 22.6 examines the conditions that the GF must satisfy in such a 
case. 


22.2 Parabolic Equations 


Elliptic partial differential equations arise in static problems, where the solu- 
tion is independent of time. Of the two major time-dependent equations, the 
wave equation and the heat (or diffusion) equation,” the latter is a parabolic 
PDE and the former a hyperbolic PDE. This section examines the heat equa- 
tion, which is of the form V7u = a*du/dt. By changing t to t/a”, we can 
write the equation as Ly ;[u] = (0/dt — V7)u(x, t) = 0. We wish to calcu- 
late the Green’s function associated with Ly, and the homogeneous BCs. 
Because of the time variable, we must also specify the solution at t = 0. 


>The heat equation turns into the Schrédinger equation if t is changed to /— 1; thus, the 
following discussion incorporates the Schrédinger equation as well. 


673 


interior vs exterior BVP 


674 


22 Multidimensional Green's Functions: Applications 


Thus, we consider the BVP 


LAW (> - v Jue t)=0 forxeD, 


u(xp, t) =0, u(x,0)=hA(x) forxy€d0D, xe D. 


(22.11) 


To find a solution to (22.11), we can use a method that turns out to be use- 
ful for evaluating Green’s functions in general—the method of eigenfunc- 
tions. Let {un}rr , be the eigenfunctions of V2 with eigenvalues {—An}~o I 


Let the BC be uw, (x,) = 0 for x, € 0D. Then 


V7 un(X) + Anun(x)=0 forn=1,2,..., xe D, 
(22.12) 
Un(Xp) =O forx, € dD. 


Equation (22.12) constitutes a Sturm-Liouville problem in m dimensions, 
which we assume to have a solution with {uy, | ae ,; aS acomplete orthonormal 
set. We can therefore write 


u(x,t) = > Cy(t)un (x). (22.13) 


n=1 


This is possible because at each specific value of t, u(x,t) is a function 
of x and therefore can be written as a linear combination of the same set, 
Cra pare The coefficients C;,(t) are given by 


ci) = f u(X, t)un(x) dx. (22.14) 
D 


To calculate C,,(t), we differentiate (22.14) with respect to time and use 
(22.11) to obtain 


dCn 


Cr(t) = dt 


= ae Dina = | [V?u(x, t) Jun (x) dx. 
D Ot D 


Using Green’s identity for the operator V* yields 


a a 
/ [un V>u _ uV>un| d"x= i (on 5" —u *) da. 
D aD on on 


Since both u and u, vanish on 0D, the RHS is zero, and we get 


Cy(t) = uV-un dx = —in f U(X, t)Un(x) dx = —AnCp. 
D D 
This has the solution C,,(t) = C,(O)e~*""’, where 
Cn (0) = i u(y, O)un(y)d” y = / h(y)un(y)a”y, 
D D 


so that 


Cr(t) = am | haunts) d™y. 


22.2 Parabolic Equations 


Substituting this in (22.13) and switching the order of integration and sum- 
mation, we get 


u(x,t) = : [Sotto fears 


n=1 


and read off the GF as par el ay (X) Up (y)@(t), where we also intro- 
duced the theta function to ensure that the solution vanishes for t < 0. More 
generally, we have 


G(x,y;t—Tt)= Se OD un (uen(W)OCE —r). (22.15) 
n=1 
Note the property 
Cc 
lim G(x,y;t-—T)= So un (x)Un(y) = 6(x—y), 
n=1 


which is usually written as 
G(x, y; 07) =d(x—y). (22.16) 
The reader may also check that 
Ly G(x, y;t —t) =d(x—y)d(t —T). (22.17) 


This is precisely what we expect for the Green’s function of an operator in 
the variables x and t. Another property of G(x, y; t — T) is that it vanishes 
on 0D, as it should. 

Having found the Green’s function and noted its properties, we are in 
a position to solve the inhomogeneous analogue of Eq. (22.11), in which 
the RHS of the first equation is f(x, f), and the zero on the RHS of the 
second equation is replaced by g(x», t). Experience with similar but simpler 
problems indicates that to make any progress toward a solution, we must 
come up with a form of Green’s identity involving Ly ; and its adjoint. It is 
easy to show that 

+ 0 
vLy,;[u] — uly ,[v] = ap —V-(vVu —uVv), (22.18) 
where Li, = —0/at — V?. 

Now consider the (m + 1)-dimensional “cylinder” one of whose bases 
is at t = €, where € is a small positive number. This base is barely above 
the m-dimensional hyperplane R”. The other base is at t = t — € and isa 
duplicate of D C R” (see Fig. 22.1). Let a”, where up =0,1,...,m, be the 
components of an (m + 1)-dimensional vector a = (a°,a',...,a™). Define 
an inner product by 


m 
a-b=) ab, =a°b° a'p! —...—a™b™ =a°b® —a-b 
u=0 
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t 


Rm 


Fig. 22.1 The “cylinder” used in evaluating the GF for the diffusion and wave equations. 
Note that the bases are not planes, but hyperplanes (that is, spaces such as R’””) 


and the (m + 1)-dimensional vector Q = (Q°, Q) by Q° = uv, Q= vVu — 
uVv. Then (22.18) can be expressed as 


iis Lb 0 1 m 
OE ee as 2 (22.19) 
xm 


L —ull [v]= = 
Ubx.rLa] ~ uly, Lv] LO ax ax0 ax! 


We recognize the RHS as a divergence in (m + 1)-dimensional space. Denot- 
ing the volume of the (7m + 1)-dimensional cylinder by D and its boundary 
by oD and integrating (22.19) over D, we obtain 


r aou 
[i (otxatin ae 


m 
= / ~ On, dS, (22.20) 
aD 10 


where dS is an element of “area” of 0D. Note that the divergence theorem 
was used in the last step. The LHS is an integration over ¢ and x, which can 
be written as 


I (vbx.:[u] — uby ,[v])a”"*!x = [- a | a” x(vby,/[u] — uly, ;[v]). 
D € D 


The RHS of (22.20), on the other hand, can be split into three parts: a 
base at t = €, a base at t = t — €, and the lateral surface. The base at t = € 
is simply the region D, whose outward-pointing normal is in the negative t 


direction. Thus, ng = —1, and nj; = O fori = 1,2,...,m. The base at t = 
t —€ is also the region D; however, its normal is in the positive rf direction. 
Thus, no = 1, and n; = O for i = 1,2,...,m. The element of “area” for 


these two bases is simply d”x. The unit normal to the lateral surface has no 
time component and is simply the unit normal to the boundary of D. The 
element of “‘area’’ for the lateral surface is dt da, where da is an element of 


22.2 Parabolic Equations 


“area” for 0D. Putting everything together, we can write (22.20) as 


[ae frst oth 0 
€ D 


TE 
=i (-0°) |..." + [ 0°|.-<a" — | da | dtQ - en. 
D 7 D > aD € 


The minus sign for the last term is due to the definition of the inner product. 
Substituting for Q yields 


[- ar | d™ x (vx, [wu] — ux ,[v]) 
€ D 


= -| u(x, €)v(x, €)d”" x + } u(x, T — €)u(x, tT —€)d" x 
D D 


bane ou dv 
-{ aa | dt{| v— —u— }. (22.21) 
aD € on on 


Let v be g(x, y; t — T), the GF associated with the adjoint operator. Then 
Eq. (22.21) gives 


- ar | d”x| (x,y; t — 1) f (x,t) — u(x, 1)5(x — y)d(t — T) | 
€ D 
a -{ u(x, €)g(x,y;¢ —t)d"x +f u(x, T — €)g(x, y; —€)d” x 
D D 


oo au ag 
-{ da | dt} g(Xp, y; t — T) — — u(Xp, t)— ]. (22.22) 
aD € on an 


We now use the following facts: 


1. 6d6(¢—7t) =O in the second integral on the LHS of Eq. (22.22), because 
t can never be equal to Tt in the range of integration. 

2. Using the symmetry property of the Green’s function and the fact that 
Ly; is real, we have g(x, y; f—t) = G(y, x; t — ft), where we have used 
the fact that ¢t and t are the time components of x and y, respectively. 
In particular, by (22.16), g(x, y; —€) = G(y, x; €) =d(x—y). 

3. The function g(x, y;f — Tt) satisfies the same homogeneous BC as 
G(x, y;t — T). Thus, g(xp, y; tf — tT) = 0 for xp € OD. 


Substituting all the above in (22.22), taking the limit € — 0, and switch- 
ing x and y and ¢ and T, we obtain 


t 
ee i & / a” yG(x,yst — 1) f(y. 0) + i u(y, 0)G(x, y; t)d"y 
0 D D 


E dG 
-f{ ar | u(Y,, T)——(X%, yp; t — tT) da, (22.23) 
0 aD any 


where 0/dny in the last integral means normal differentiation with respect 
to the second argument of the Green’s function. 
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Equation (22.23) gives the complete solution to the BVP associated with 
a parabolic PDE. If f(y, t) = 0 and wu vanishes on the hypersurface 0D, 
then Eq. (22.23) gives 


u(x, f) = : uly, 0)G(x, y; Hd’y, (22.24) 
D 


which is the solution to the BVP of Eq. (22.11), which led to the general 
Green’s function of (22.15). Equation (22.24) lends itself nicely to a phys- 
ical interpretation. The RHS can be thought of as an integral operator with 
kernel G(x, y; ¢). This integral operator acts on u(y, 0) and gives u(x, t); 
that is, given the shape of the solution at t = 0, the integral operator pro- 
duces the shape for all subsequent time. That is why G(x, y; rf) is called the 
evolution operator, or propagator. 


22.3. Hyperbolic Equations 


The hyperbolic equation we will discuss is the wave equation 


a? 
Ly [ul = (a — v2) u(x, t)=0, (22.25) 
where we have set the speed of the wave equal to unity. 

We wish to calculate the Green’s function for Ly; subject to appropriate 
BCs. Let us proceed as we did for the parabolic equation and write 


Gy; 0) = D> Crly; un) 
n=l (22.26) 


Cy.) = / Gx, y; tity) dx, 
D 


where u,(x) are orthonormal eigenfunctions of V2 with eigenvalues —A,, 

satisfying certain, as yet unspecified, BCs. As usual, we expect G to satisfy 
a2 

Ly [G] = (a = v2) Gtx y;t—t)=d(x—y)d(t —T). (22.27) 


Substituting (22.26) in (22.27) with t = 0 and using V7Un = —AnUn, gives 


o a2 lee) 
>| —5Cn(y; t) + AnCn(¥ func = So [un (v8) Jun, 


ar 


n=1 n=l 


where we used 6(x — y) = pa Un(X)Un(y) on the RHS. The orthonor- 
mality of uw, now gives Cy(y; t) + AnCn(y; t) = un(y)d(£). It follows that 
Cn (y; t) is separable. In fact, 


2 


Cn(y; t) =Un(y)Tn(t) where (5 + i) Tr (t) = 5(¢). 
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This equation describes a one-dimensional Green’s function and can be 
solved using the methods of Chap. 20. Assuming that 7,,(t) = 0 for t < 0, 
we obtain 7,,(t) = (sin@pt/@n,)O(t), where wo = h,. Substituting all the 
above results in (22.26), we obtain 


[e,e) 


SiN @yt 
G(x, y3 0) = Dun ()un(y) —-0(0), 
0) 
n=1 
or, more generally, 
= sin w,(t — T) 
G(x, y;t -—T)= So un (x)Un(y) = O6(t —T). (22.28) 
n=1 mn 
We note that 
i dG 
G(x, y; 0 ) =0 and —(x,y;1) = d6(x—Yy), (22.29) 
ot t>0+ 


as can easily be verified. 
With the Green’s function for the operator Ly, of Eq. (22.25) at our dis- 
posal, we can attack the BVP given by 


2 
(Ga = v?) cs, th=f(x,t) forxe D, 
u(xp, t) =h(Xp, ft), u(x,0)=@(x) forx,edD, xeED, (22.30) 
=wW(x) forxe D. 


1=0 


ou 
— (x,t 
race ) 


As in the case of the parabolic equation, we first derive an appropriate ex- 
pression of Green’s identity. This can be done by noting that 


) dv Ou 
vLx ¢[u] — uly ,[v] = (« a ) V-(uVu—vVu). 


Thus, Lx; is formally self-adjoint. Furthermore, we can identify 
0 ou 
Q°=u——v— and Q=uVv—vVu. 


Following the procedure used for the parabolic case step by step, we can 
easily derive a Green’s identity and show that 


t 
ox.) = f ar | d™ yG(x, y;t —Tt) f(y, T) 
0 D 
dG rs 
+f [ween th-o(y)—(@& y; nla y 
D ot 
t 
-{ ar f RG eo Bde: (22.31) 
0 aD Ony 


The details are left as Problem 22.11. 
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For the homogeneous PDE with the homogeneous BC h = 0 = yf, we get 


dG 
u(x,t) = - [om ow y;tjdy. 


Note the difference between this equation and Eq. (22.24). Here the prop- 
agator is the time derivative of the Green’s function. There is another dif- 
ference between hyperbolic and parabolic equations. When the solution to 
a parabolic equation vanishes on the boundary and is initially zero, and the 
PDE is homogeneous [ f (x, t) = 0], the solution must be zero. This is clear 
from Eq. (22.23). On the other hand, Eq. (22.31) indicates that under the 
same circumstance, there may be a nonzero solution for a hyperbolic equa- 
tion if w is nonzero. In such a case we obtain 


u(x, t) =| vonomy: t)d” y. 


This difference in the two types of equations is due to the fact that hyperbolic 
equations have second-order time derivatives. Thus, the initial shape of a 
solution is not enough to uniquely specify it. The initial velocity profile is 
also essential. We saw examples of this in Chap. 19. 

The discussion of Green’s functions has so far been formal. The main 
purpose of the remaining sections is to bridge the gap between formalism 
and concrete applications. Several powerful techniques are used in obtaining 
Green’s functions, but we will focus only on two: the Fourier transform 
technique, and the eigenfunction expansion technique. 


22.4 The Fourier Transform Technique 


Recall that any Green’s function can be written as a sum of a singular part 
and a regular part: G = G, + H. Since we have already discussed homoge- 
neous equations in detail in Chap. 19, we will not evaluate H in this section 
but will concentrate on the singular parts of various Green’s functions. 

The BCs play no role in evaluating G,. Therefore, the Fourier transform 
technique (FTT), which involves integration over all space, can be utilized. 
The FTT has a drawback—it does not work if the coefficient functions are 
not constants. For most physical applications treated in this book, however, 
this will not be a shortcoming. 

Let us consider the most general SOLPDO with constant coefficients, 


0 0 
Ly = —— bj,——_., 22.32 
x= 49+) 4; a + pe ito Om ( ) 


where ag, a;, and bj, are constants. The corresponding Green’s function 
has a singular part that satisfies the usual PDE with the delta function on the 
RHS. The FTT starts with assuming a Fourier integral representation in the 
variable x for the singular part and for the delta function: 


Gey / dG, (k, ye, 


1 
(27 yi/2 


22.4 The Fourier Transform Technique 


1 . 
8(x—y)= aa / d™kelk GY), 
JU 


Substituting these equations in the PDE for the GF, we get 


—ik- 
Mega ( — ) 


(2m ynl? \ag + i Dy ajkj — OF par Oj jk 
and 
Cas / d™k eles . (0233) 
pets yee ag +i My ajkj — ee ay Dkk jkr 


If we can evaluate the integral in (22.33), we can find G. 

The following examples apply Eq. (22.33) to specific problems. Note that 
(22.33) indicates that G; depends only on x — y. This point was mentioned 
in Chap. 20, where it was noted that such dependence occurs when the BCs 
play no part in an evaluation of the singular part of the Green’s function of 
a DE with constant coefficients; and this is exactly the situation here. 


22.4.1 GF for the m-Dimensional Laplacian 


We calculated the GF for the m-dimensional Laplacian in Sect. 21.2.2 using 
a different method. With a9 = 0 = aj, bj; = 4), and r =x — y, Eq. (22.33) 
reduces to 

eikr 


1 mn 
G,(r) = oa / a" ks. (22.34) 


where k? = kt +---+k? =k-k. To integrate (22.34), we choose spher- 
ical coordinates in the m-dimensional k-space. Furthermore, to simplify 
calculations we let the k,,-axis lie along r so that r = (0,0,..., |r|) and 
k-r=k|r|cos6 [see Eq. (21.12)]. Substituting this in (22.34) and writing 
d'"k in spherical coordinates yields 


-1 eiklrl cos 6} 
G;(r) / k2 
xk"! (sin 01)" «+ sin @m—2 dk dO, +++ dOm—1. (22.35) 


From Eq. (21.15) we note that dQ2,, = (sin 61)""—2d0,dQm_1. Thus, after 
integrating over the angles 62, ..., @,—1, Eq. (22.35) becomes 


1 [o.@) a . 
Gs) =- Tn f nak f (sin 01)" etk les gg), 
0 0 


The inner integral can be looked up in an integral table (see [Grad 65, 
p. 482]): 


1 2 m/2-1 m—1 
i} (sin 0,)"~7e!k Fl cos gg, = vi(z) r (“=) Jm(2—(kr). 
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Substituting this and (21.16) in the preceding equation and using the result 
(see [Grad 65, p. 684]) 


- rebel 
ii x" Jy (ax) dx = 24a! alates 
0 re» 
we obtain 
T(m/2—-1) (1 
G;(r) = ian es) for m > 2, 


which agrees with (21.20) since [(m/2) = (m/2 — 1)! (@m/2 — 1). 


22.4.2 GF for the m-Dimensional Helmholtz Operator 
For the Helmholtz operator V7 — yu, Eq. (22.33) reduces to 


1 _ eikr 
- / ak ; 
(27) p2 + k2 


Following the same procedure as in the previous subsection, we find 


G (r) _ Qm-1 - lak [sino ett a0, 
5 = 
(20)" Jo w+k? Jo 


Qm—1 = (2\"2" (m= ie xm/2 
== - r Jinj2—1 (kr) dk. 
(2x) va(=) 2 } pe+ke m/2 i (kr) 


Here we can use the integral formula (see [Grad 65, pp. 686 and 952]) 


Gs(r) = 


ad ae v—=npn 
I eoreen = aa poe 
where 
Ky(2) = el PHO GD, 
to obtain 


Qe DUP). Sag) 1 
Gute) =- 2 Va (=) ee ae aaiae: Py CT 


which simplifies to 


/2 b m/2-1 
Gs) = — Ga (4) gm He ciir), (22.36) 


It eae be shown (see Problem 22.8) that for m = 3 this reduces to G,;(r) = 
which is the Yukawa potential due to a unit charge. 


= 2 
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We can easily obtain the GF for V* + yi? by substituting +i for jw in 
Eq. (22.36). The result is 


Gs(r) =i" 


a /2 (¢ 


m/2-1 a) 
(Qn ym/2 “) An jo—1 Eur). (22.37) 


For m = 3 this yields Gs (r) = —e*!“" /(4zrr). The two signs in the exponent 
correspond to the so-called incoming and outgoing “waves”. 


Example 22.4.1 For a non-local potential, the time-independent Schrédin- 


ger equation is . 
Non-local potentials 


depend not only on the 
observation point, but 
also on some other 


2 
wut | V(r,r')W(r') der’ = EW(r). 
2 R3 


Then, the integral equation associated with this differential equation is (see 
Sect. 21.4) 


“non-local” variables. 


; ik|r—r'| 
W (r) = AelKT — “sf rf Br" V('.x")W(r"). (22.38) 
27h R3 Ir r | R3 


For a separable potential, for which V(r’, r”) = —g?U (r’) U(r"), we can 
solve (22.38) exactly. We substitute for V(r’, r”) in (22.38) to obtain 


Wr) = Ae’*T + He? : Fate u(r) [ Br'U (rw (r’). 
Qh? Jp3 lr —r’| R3 
(22.39) 


Defining the quantities 


ug? ; etkir—r| i 
n= dr’ U(r’), caf aru we’ 
Oe mor fs Cie eer ae! 
(22.40) 
and substituting them in (22.39) yields W(r) = Aha-c Q(r). Multiplying 
both sides of this equation by U(r) and integrating over R*, we get 


C= Af ekty (yd? r + cf U(r) O(r) d?r 
R3 R3 


= omyPao—w +c [ U(r) O(r) a’, 
R3 


from which we obtain 
(2)? AU(-k) 
~ 1 fgs UW) Or) dr’ 


leading to the solution 


(27r)3/2 AU (—k) 


a ik-r 
ee 1 — Ips UP) OW) Br’ 


O(n). (22.41) 


In principle, U (—k) [the Fourier transform of U(r)] and Q(r) can be cal- 
culated once the functional form of U(r) is known. Equations (22.40) and 
(22.41) give the solution to the Schrédinger equation in closed form. 
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22.4.3 GF for the m-Dimensional Diffusion Operator 


When dealing with parabolic and hyperbolic equations, we will find it con- 
venient to consider the “different” variable (usually t) as the zeroth coordi- 
nate. In the Fourier transform we then use w = —ko and write 


1 00 . 
/ dw / a°kG tk, oe? 
—CO 


Gs. = OH etDz 


1 oo (22.42) 
d(r)d(t) = som | de f ameeteo”, 
—00 


where r is the m-dimensional position vector. 
We substitute (22.42) in (0/dt — V2)G;(r, t) = 6(r)8(t) to obtain 


1 k oo e iat 
Gs(r, t) = ——— | dke' | dw——,, 22.43 
= [arte fda (22.43) 
where as usual, k* = pean ke. The @ integration can be done using the 
calculus of residues. The integrand has a simple pole at w = —ik’, that is, in 


the lower half of the complex w-plane (LHP). To integrate, we must know 
the sign of ft. If t > 0, the exponential factor dictates that the contour be 
closed in the LHP, where there is a pole and, therefore, a contribution to 
the residues. On the other hand, if t < 0, the contour must be closed in 
the UHP. The integral is then zero because there are no poles in the UHP. 
We must therefore introduce a step function 6(t) in the Green’s function. 
Evaluating the residue, the w integration yields —2zri eet (The minus sign 
arises because of clockwise contour integration in the LHP.) Substituting 
this in Eq. (22.43), using spherical coordinates in which the last k-axis is 
along r, and integrating over all angles except 0), we obtain 


Qm-1 a n= kt a m—2_ikr cos 6, 
ony fp k'"—"dke : (sin6,)”" “e d 
Qn a\ml2-1 7 4 

7 Cl) —_= 
oe *(-) 2 


ee 2 
x fame Pyar) dk 
0 


Gs(r, t) = A(t) 


=6(t) 


For the 6; integration, we used the result quoted in Sect. 22.4.1. 
Using the integral formula (see [Grad 65, pp. 716 and 1058]) 


a ee) 
/ xhe™*" J, (Bx) dx 
0 


ye Eg (OHO sel 
~ Qrtley(utv+D/2P(y + 1) 2 ? Aa}? 


where @ is the confluent hypergeometric function, we obtain 


Iz m—1)/2 q\m/2-1 m/2-1 Wy a. G2 
Cin) =e0) (20 )” va(=) ed Cae =): 


(22.44) 


22.4 The Fourier Transform Technique 


The power-series expansion for the confluent hypergeometric function ® 
shows that ® (a, a; z) = e*. Substituting this result in (22.44) and simplify- 
ing, we finally obtain 


—r?/4t 


Gott.) = Gap 


0(t). (22.45) 
22.4.4 GF for the m-Dimensional Wave Equation 


The difference between this example and the preceding one is that here the 
time derivative is of second order. Thus, instead of Eq. (22.43), we start with 


1 om [ove] e iat 
— m 1K-r 
G,(r, N=-a— | 4 ke [tose (22.46) 


The @ integration can be done using the method of residues. Since the sin- 
gularities of the integrand, m = +k, are on the real axis, it seems reasonable 
to use the principal value as the value of the integral. This, in turn, depends 
on the sign of t. If t > 0 (t < 0), we have to close the contour in the LHP 
(UHP): to avoid the explosion of the exponential. If one also insists on not 
including the poles inside the contour,’ then one can show that 


a d ict sinkt @) 
o =—1 €(t), 
oo) wt — k? k 


where 


ift > 0, 


1 
a t", ift <0. 


Substituting this in (22.46) and integrating over all angles as done in the 
previous examples yields 


€(t) 


Gs0)= 22m ym/2pm/2=1 


[o,@) 

i k™?-! fi -1(kr) sinkt dk. (22.47) 
0 

As Problem 22.25 shows, the Green’s function given by Eq. (22.47) sat- 
isfies only the homogeneous wave equation with no delta function on the 
RHS. The reason for this is that the principal value of an integral chooses 
a specific contour that may not reflect the physical situation. In fact, the 
Green’s function in (22.47) contains two pieces corresponding to the two 
different contours of integration, and it turns out that the physically interest- 
ing Green’s functions are obtained, not from the principal value, but from 
giving small imaginary parts to the poles. Thus, replacing the w integral with 
a contour integral for which the two poles are pushed in the LHP and using 


3This will determine how to (semi)circle around the poles. 
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the method of residues, we obtain 


lee) eiot ew tat On 
Lup =[ we =|. Za dz= k 6(t) sinkt. 


The integral is zero for t < 0 because for negative values of f, the contour 

must be closed in the UHP, where there are no poles inside C,. Substituting 

this in (22.46) and working through as before, we obtain what is called the 
retarded Green's retarded Green’s function: 


function 0(t) 


[o.@) 
G(r, t) / kl! Jj—i(kr) sinkt dk. (22.48) 


~ Qmyn/2pm/2—1 Jy 
advanced Green's If the poles are pushed in the UHP we obtain the advanced Green’s 
function function: 


6(—1) 


(adv) = 
Gy @ = — (Qa yn /2p-m/2-1 


lo) 
/ k™/2-! J, 2-1 (kr) sinkt dk. 
0 


(22.49) 

Unlike the elliptic and parabolic equations discussed earlier, the integral 

over k is not a function but a distribution, as will become clear below. To 

find the retarded and advanced Green’s functions, we write the sine term in 

the integral in terms of exponentials and use the following (see [Grad 65, 
p. 712)): 


(2B)"T' + 1/2) 
J (a2 + B2)Pt1/2 


To ensure convergence at infinity, we add a small negative number to the 
exponential and define the integral 


[ xve™ J, (Bx) dx = for Re(a) > |Im(A)]. 
0 


[o,e) 
IF= i kYe FIFOK 7, (kr) dk 
0 
— Qr)y’Tw +1/2) 
= Te 


For the GFs, we need to evaluate the (common) integral in (22.48) and 
(22.49). With v = m/2 — 1, we have 


[Fit +6)? + elo 


CO 
1 
1s | k’ Jy(kr) sinkt dk = = lim (17 — I>) 
0 


1 e—>0 . 
_ Qry’P + 1/2) 
- 2i./m 


1 1 
hi : 
x lim [r2 + (—it tee)2Ptl/2 [2 + (it + )2]et 1/2 


At this point, it is convenient to discuss separately the two cases of m odd 
and m even. Let us derive the expression for odd m (the even case is left for 
Problem 22.26). Define the integer n = (m — 1)/2=v+ 5 and write I”) as 


oo. GY Te). | 1 1 | 

IS = - lim - - : 
2i./ e>0| [r2 + (—it+e)?]" 9 [r2+ Git+e)?]" 

(22 


50) 


22.4 The Fourier Transform Technique 
Define u =r? + (—it + €)*. Then using the identity 
1 (-1)"-! qr} (:) 
u® (n—1)! du"-!\u 
and the chain rule, df/du = (1/2r)df/dr, we obtain d/du = (1/2r)d/dr 
and 


1 ol ce a 1 
[r2+(tit+e)2]}" (n—1)! 2r or r? + (tit +e)? | 
Therefore, Eq. (22.50) can be written as 
CO 
ey k"/ 7,1 2(kr) sinkt dk 
0 


_ ry Pra) 1 
4 /r  (n—1)! 


19 n—-1 ‘ 1 i 
<(-se) {lin ieee aural 


(22.51) 


The limit in (22.51) is found in Problem 22.27. Using the result of that 
problem and I'(n) = (n — 1)!, we get 


[o@) 
ey kn! 7,1 /2(kr) sinkt dk 
0 


afelary 4 ( 1 a 


n—-1 1 
- =a) {-[50-+ ae]. (22.52) 


Employing this result in (22.48) and (22.49) yields 


1 1 a\""'fse- =i 
G(r, 1) = ( ) ( 2| forn =", 


4a 2mr or r 
a (22.53) 
Ge 7 = 1 ( ! ~) | ee sii 
? : 4a \ 2nxr Or r 2 


The theta functions are not needed in (22.53) because the arguments of the 
delta functions already meet the restrictions imposed by the theta functions. 

The two functions in (22.53) have an interesting physical interpretation. 
Green’s functions are propagators (of signals of some sort), and Ge (r,t) 
is capable of propagating signals only for positive times. On the other hand, 
ce (r, f) can propagate only in the negative time direction. Thus, if ini- 
tially (t = 0) a signal is produced (by appropriate BCs), both Ge (r,t) 
and Gor (r,t) work to propagate it in their respective time directions. It 
may seem that em (r,t) is useless because every signal propagates for- 
ward in time. This is true, however, only for classical events. In relativistic 
quantum field theory antiparticles are interpreted mathematically as moving 
in the negative time direction! Thus, we cannot simply ignore qa) (r, f). 
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In fact, the correct propagator to choose in this theory is a linear combina- 
tion of G&”) (r,t) and GS"? (r, 1), called the Feynman propagator (see 
[Wein 95, pp. 274—280)]). 

The preceding example shows a subtle difference between Green’s func- 
tions for second-order differential operators in one dimension and in higher 
dimensions. We saw in Chap. 20 that the former are continuous functions in 
the interval on which they are defined. Here, we see that higher dimensional 
Green’s functions are not only discontinuous, but that they are not even func- 
tions in the ordinary sense; they contain a delta function. Thus, in general, 
Green’s functions in higher dimensions ought to be treated as distributions 
(generalized functions). 


22.5 The Eigenfunction Expansion Technique 


Suppose that the differential operator Ly, defined in a domain D with bound- 
ary 0D, has discrete eigenvalues {A,}°° , with corresponding orthonormal 
eigenfunctions {um (x) }pr_ ,- These two sets may not be in one-to-one cor- 
respondence. Assume that the u,(x)’s satisfy the same BCs as the Green’s 
function to be defined below. 

Now consider the operator Ly — 41, where A is different from all 1,,’s. 
Then, as in the one-dimensional case, this operator is invertible, and we can 
define its Green’s function by (Lx — A)G) (x, y) = 6(k—y) where the weight 
function is set equal to one. The completeness of {uy (Ol implies that 


[ee 


S(x—y) =) un(xus(y) and Ga(x,y) = Yo an(y)un(). 


n=1 n=1 
Substituting these two expansions in the differential equation for GF yields 


[ee 


Yi Qn = Dan (yun (®) = Yo un (Xue (y). 
n=1 


n=1 
The orthonormality of the uy’s gives an(y) = u*(y)/(n — A). Therefore, 


oA Un (X) UH (Y) 
aay=) 22.54 
ry) = >> en (22.54) 
n=1 
In particular, if zero is not an eigenvalue of Lx, its Green’s function can be 
written as 


[e,e) 


Gy)=)- cee (22.55) 


n=1 
This is an expansion of the Green’s function in terms of the eigenfunctions 
of Ly. 

It is instructive to consider a formal interpretation of Eq. (22.55). Re- 
call that the spectral decomposition theorem permits us to write f(A) = 
>>; f(i)P; for an operator A with (distinct) eigenvalues A; and projec- 
tion operators P;. Allowing repetition of eigenvalues in the sum, we may 


22.5 The Eigenfunction Expansion Technique 


write f(A) = yy fOAn)|Un) (Un|, where n counts all the eigenfunctions cor- 
responding to eigenvalues. Now, let f(A) = A7!. Then 


u u 
G=AT =A, un (unl = 
n 7 


n 


or, in “matrix element” form, 


G(x, y) = (x|Gly) aD LD ee 


n n 


This last expression coincides with the RHS of Eq. (22.55). 

Equations (22.54) and (22.55) demand that the u,(x) form a complete 
discrete orthonormal set. We encountered many examples of such eigen- 
functions in discussing Sturm-Liouville systems in Chap. 19. All the S- 
L systems there were, of course, one-dimensional. Here we are general- 
izing the S-L system to m dimensions. This is not a limitation, however, 
because—for the PDEs of interest—the separation of variables reduces an 
m-dimensional PDE to m one-dimensional ODEs. If the BCs are appropri- 
ate, the m ODEs will all be S-L systems. A review of Chap. 19 will reveal 
that homogeneous BCs always lead to S-L systems. In fact, Theorem 19.4.1 
guarantees this claim. Since Green’s functions must also satisfy homoge- 
neous BCs, expansions such as those of (22.54) and (22.55) become possi- 
ble. 


Example 22.5.1 As a concrete example, let us obtain an eigenfunction 
expansion of the GF of the two-dimensional Laplacian, V7 = 47/dx? + 
d*/dy*, inside the rectangular region 0 < x < a, 0 < y <b with Dirich- 
let BCs. Since the GF vanishes at the boundary, the eigenvalue problem 
becomes V7u = Au with u(0, y) = u(a, y) = u(x, 0) = u(x, b) = 0. The 
method of separation of variables gives the orthonormal eigenfunctions* 


( = 2 . (nat . (mr f a 
Umn(X, y = ag a sin 5 y orm,n=1,2,..., 


whose corresponding eigenvalues are Ajyn = -(() + (4 )?]. 
Inserting the eigenfunctions and the eigenvalues in Eq. (22.55), we obtain 


oo Pocxdt 
G(r,r) =G(x,y: x'y)= - Umn(X, Y)Umn(x', y’) 


m,n=1 Amn 
ae se ea 
ab m,n=1 Coe or (7) , 


where we changed x to r and y to r’. Note that the eigenvalues are never 
zero; thus, G(r, r’) is well-defined. 


4The inner product is defined as a double integral over the rectangle. 


689 


690 


22 Multidimensional Green's Functions: Applications 


In the preceding example, zero was not an eigenvalue of Ly. This condi- 
tion must hold when a Green’s function is expanded in terms of eigenfunc- 
tions. In physical applications, certain conditions (which have nothing to do 
with the BCs) exclude the zero eigenvalue automatically when they are ap- 
plied to the Green’s function. For instance, the condition that the Green’s 
function remain finite at the origin is severe enough to exclude the zero 
eigenvalue. 


Example 22.5.2 Let us consider the two-dimensional Dirichlet BVP V7u = 
f, with u = 0 on a circle of radius a. If we consider only the BCs and ask 
whether zero is an eigenvalue of V”, the answer will be yes, as the following 
argument shows. 

The most general solution to the zero-eigenvalue equation, Vu = 0, in 
polar coordinates can be obtained by the method of separation of variables: 


[o@) 
u(p,y)=A+ Blnp+ > \(bnp” + b),0~") cosng 
n=1 
lee) 
+5 (cnp" +¢),p~") sinng. (22.56) 


a=1 
Invoking the BC gives 


[oe 
0=u(a,g)=A+ Blnat+ Yo (bna” + bia") cosng 
n=1 
(oe) 
+ So (cna” + Ca") sinng, 


n=1 


which holds for arbitrary ¢ if and only if 


A=-Blna, l= =—b,a": a —c,ar". 


Substituting in (22.56) gives 


oo 2n 
aioe in(2) +) (0" = Yih cosng +p sinng). (22.57) 
a p 


n=1 


Thus, if we demand nothing beyond the BCs, V7 will have a nontrivial 
eigen-solution corresponding to the zero eigenvalue, given by Eq. (22.57). 

Physical reality, however, demands that u(p, g) be well-behaved at the 
origin. This condition sets B, bi,, and cj, of Eq. (22.56) equal to zero. The 
BCs then make the remaining coefficients in (22.56) vanish. Thus, the de- 
mand that u(p, g) be well-behaved at o = 0 turns the situation completely 
around and ensures the nonexistence of a zero eigenvalue for the Laplacian, 
which in turn guarantees the existence of a GF. 


In many cases the operator Lx as a whole is not amenable to a full Sturm- 
Liouville treatment, and as such will not yield orthonormal eigenvectors 
in terms of which the GF can be expanded. However, it may happen that 


22.5 The Eigenfunction Expansion Technique 


L, can be broken up into two pieces one of which is an S-L operator. In 
such a case, the GF can be found as follows: Suppose that L; and Ly are 
two commuting operators with Lz an S-L operator whose eigenvalues and 
eigenfunctions are known. Since Lz commutes with L,, it can be regarded as 
a constant as far as operations with (and on) L; are concerned. In particular, 
(L; + L2)G = 1 can be regarded as an operator equation in L; alone with L2 
treated as a constant. Let x; denote the subset of the variables on which Lj 
acts, and let x7 denote the remainder of the coordinates. Then we can write 
5(x — y) = 6(x1 — y,)5(K2 — y>). Now let G1 (x1, y;; k) denote the Green’s 
function for L; +k, where k is a constant. Then it is easily verified that 


G(x, y) = Gi(x1, yy; L2)d (x2 — yo). (22.58) 
In fact, 


(Li +b2)G(x, y) = [(Li + b2)Gi (x1, yy; bz) | 5(x2 — yo). 
| 


=6 (x —y,) by definition of G1 


Once G, is found as a function of Lp, it can operate on 6(x2 — y>) to yield the 
desired Green’s function. The following example illustrates the technique. 


Example 22.5.3 Let us evaluate the Dirichlet GF for the two-dimensional 
Helmholtz operator V? — k? in the infinite strip 0 < x <a, —0o < y < ov. 
Let L} = 07/dy? — k* and Ly = 07/dx?. Then, 

G(r, r’) = G(x, x’, y, y’) a Gi(y, y's L>)d(x _ x), 


where (d?/dy? — w?)G, =8(y — y’), w* =k? — Ly, and Gi(y = —00) = 
G1(y = o&) = 0. The GF G, can be readily found (see Problem 20.12): 


ew Hly—y'| eV -Laly—y'| 


Gi(y, y'3b2) = = 
( ) 2m 2/k? — Ly 
The full GF is then 
eV —baly—y'| 
G(r, r) a ( )s x x) (22.59) 
2/k?2 — Lo 
The operator L2 constitutes an S-L system with eigenvalues 4, = —(nx/a)? 


and eigenfunctions u,(x) = /2/asin(nax/a) where n = 1,2,.... There- 
fore, the delta function 6(x — x’) can be expanded in terms of these eigen- 


functions: 
pic nit ni 
d(x —x’)=— sin{ ——x } sin{ —x’ J. 
(r-x) ==> (=) (=*) 


n=1 
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As = Vk* — Ly acts on the delta function, Lz operates on the first factor in 
the above expansion and gives 1,,. Thus, Ly in Eq. (22.59) can be replaced 
by —(nmx/a)*, and we have 


. 1 e- VRP +(nrx/a)ly—y" _ (nx _ (nn , 
eua\= 3 sin x } sin( —x’ ]. 
o n=1 kat (nx /a)* : 


Sometimes it is convenient to break an operator into more than two parts. 


In fact, in some cases it may be advantageous to define a set of commut- 
ing self-adjoint (differential) operators {Mj} such that the full operator L 
can be written as L= }> ; L;Mj where the differential operators {Lj} act on 
variables on which the M; have no action. Since the M;’s commute among 
themselves, one can find simultaneous eigenfunctions for all of them. Then 
one expands part of the delta function in terms of these eigenfunctions in the 
hope that the ensuing problem becomes more manageable. The best way to 
appreciate this approach is via an example. 


Example 22.5.4 Let us consider the Laplacian in spherical coordinates, 


vy = £8 (2%), 1 8 (no oH ou 
~ r2arX\ ar r2sin@ | a0 0 ag? | 


If we introduce 


re) 0 
M\u=u, Lu=50(P2), 


‘i ! ox, gil 1 a7u : 1 
u= sin 3 u= Zu, 
24 = sind | 00 a0) ag? ee 


the Laplacian becomes V2 =L,M, +L.Mb. The mutual eigenfunctions of 
M, and Mbp are simply those of Mo, which is (the negative of) the angular 
momentum operator discussed in Chap. 13, whose eigenfunctions are the 
spherical harmonics. We thus have M2Yj,(6, g) = —lU + LYinm @, ¢). 

Let us expand the Green’s function in terms of the spherical harmonics: 


(22.60) 


G(r, r’) = a gim(ri 1,9, 9) Yim, 9). 


I,m 
We also write the delta function as 


_ dr =r')b(6 — 069 — ¢’) 
r’2 sin 0’ 


Sir —r'! 
= wen Y Yim (9, D)Y in (0, v’), 


l,m 


where we have used the completeness of the spherical harmonics. Substitut- 
ing all of the above in V*G(r, r’) = 5(r — 1’), we obtain 


22.6 Problems 


V7G(r, 1’) = (LiMy + LoM2) ) > gim(r 7’, 0’, @')¥im (0. 9) 


I,m 
= SP [bi — 20+ be] gin (rs 7’, 6, 9) Vim 8.) 
I,m 
6a 
as eee > Via (0’, 0')Yim(@, ~). 


I,m 
The orthogonality of the Yj(@, g) yields 


5 es 
[Li —/0 + I)k2] gim(r; 7’, 0’, 9’) = oe vn (0! y'). 


This shows that the angular part of gjm is simply Y;,,(0’, y’). Separating this 
from the dependence on r and r’ and substituting for L; and L2, we obtain 


ld Go Id +1) b(r—r’) 


r2 dr dr ) Sim = 2? (22.61) 


where this last gj is a function of r and r’ only. The techniques of Chap. 20 
can be employed to solve Eq. (22.61) (see Problem 22.29). 


The separation of the full operator into two “smaller” operators can also 
be used for cases in which both operators have eigenvalues and eigenvec- 
tors. The result of such an approach will, of course, be equivalent to the 
eigenfunction-expansion approach. However, there will be an arbitrariness 
in the operator approach: Which operator are we to choose as our L; ? While 
in Example 22.5.3 the choice was clear (the operator that had no eigenfunc- 
tions), here either operator can be chosen as L). The ensuing GFs will be 
equivalent, and the series representing them will be convergent, of course. 
However, the rate of convergence may be different for the two. It turns 
out, for example, that if we are interested in G(x, y; x’, y’) for the two- 
dimensional Laplacian at points (x, y) whose y-coordinates are far from y’, 
then the appropriate expansion is obtained by letting L; = 07/dy”, that is, 
an expansion in terms of x eigenfunctions. On the other hand, if the Green’s 
function is to be calculated for a point (x, y) whose x-coordinate is far away 
from the singular point (x’, y’), then the appropriate expansion is obtained 
by letting L; = 07/dx?. 


22.6 Problems 


22.1 Find the GF for the Dirichlet BVP in two dimensions if D is the UHP 
and 0D is the x-axis. 


22.2 Add f(r’) to H(r,r”) in Example 22.1.2 and retrace the argument 
given there to show that f(r’) =0. 


22.3 Use the method of images to find the GF for the Laplacian in the exte- 
rior region of a “sphere” of radius a in two and three dimensions. 
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22.4 Derive Eq. (22.7) from Eq. (22.6). 


22.5 Using Eq. (22.7) with p = 0, show that if g(6’, g’) = Vo, the potential 
at any point inside the sphere is Vo. 


22.6 Find the BC that the GF must satisfy in order for the solution u to 
be representable in terms of the GF when the BC on wu is mixed, as in 
Eq. (22.10). Assume a self-adjoint SOLPDO of the elliptic type, and con- 
sider the two cases a(x) 4 0 and B(x) 4 0 for x € OD. Hint: In each case, 
divide the mixed BC equation by the nonzero coefficient, substitute the re- 
sult in the Green’s identity, and set the coefficient of the u term in the 0D 
integral equal to zero. 


22.7 Show that the diffusion operator satisfies 
Ly G(x, y;t— 1) =d(x—y)d(t— 7). 
Hint: Use 
ue (¢—1)=d¢—T) 
—(t—t)=<d(t—T). 
ot 


22.8 Show that for m = 3 the expression for Gs(r) given by Eq. (22.36) 
reduces to Gs(r) = —e~"" /(4zr). 


22.9 The time-independent Schrédinger equation can be rewritten as 
V+ ew — “vay =0 
( Se ) ~ 32 (r)W =0, 


where k* = 2j1E/h? and is the mass of the particle. 


(a) Use techniques of Sect. 21.4 to write an integral equation for W. 
(b) Show that the Neumann series solution of the integral equation con- 
verges only if 


2.3. . 2xh*Imk 
|Va~a’r < ——. 
R3 Me 


(c) Assume that the potential is of Yukawa type: V(r) = g7e~*"/r. Finda 
condition between the (bound state) energy and the potential strength 
g that ensures convergence of the Neumann series. 


22.10 Derive Eq. (22.29). 


22.11 Derive Eq. (22.31) using the procedure outlined for parabolic equa- 
tions. 


22.12 Consider GF for the Helmholtz operator V* + 17 in two dimensions. 
(a) Show that 


G(r,r’) = — 2H (ule r|)+H(r,r), 


22.6 Problems 


where H(r, r’) satisfies the homogeneous Helmholtz equation. 
(b) Separate the variables and use the fact that H is regular at r =r’ to 
show that H can be written as 


[o.@) 
H(r,r’) = Jn(r)[an(r’) cosnd + bn(r’) sinnd ]. 
n=0 


(c) Now assume a circular boundary of radius a and the BC G(a, r’) = 0, 
in which a is a vector from the origin to the circular boundary. Using 
this BC, show that 


° 20 
/\ __ u (1) 2, 72 / _ Qi 
ag(r’) = 8x Jo(ua) [ Hy (Ja +r? —2ar’cos(6 — 6’) da, 


\ L 
on) = aed, ab 


20 
[8 led +7? — 2ar*cos(@ —07)) cose d, 
0 


i 


ba(t’) = Art Jn (ua) 


20 
x [ H\ (usa? +r? — 2ar’ cos(9 — 6")) sinnd 9. 
These equations completely determine H(r, r’) and therefore G(r, r’). 


22.13 Use the Fourier transform technique to find the singular part of the 
GF for the diffusion equation in one and three dimensions. Compare your 
results with that obtained in Sect. 22.4.3. 


22.14 Show directly that both G“? and G*”’ satisfy V2G = 5(r)8(t) in 
three dimensions. 


22.15 Consider a rectangular box with sides a, b, and c located in the first 
octant with one corner at the origin. Let D denote the inside of this box. 


(a) Show that zero cannot be an eigenvalue of the Laplacian operator with 
the Dirichlet BCs on 0D. 
(b) Find the GF for this Dirichlet BVP. 


22.16 Find the GF for the two-dimensional Helmholtz equation (V7 + 
k?)u = 0 on the rectangle 0 <x <a,0<y<b. 


22.17 For the operator ad*/dx* + b, where a > 0 and b < 0, find the sin- 
gular part of the one-dimensional GF. 


22.18 Calculate the GF of the two-dimensional Laplacian operator appro- 
priate for Neumann BCs on the rectangle O< x <a,0<y<b. 


22.19 For the Helmholtz operator V7 — k? in the half-space z > 0, find the 
three-dimensional Dirichlet GF. 
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22.20 For the Helmholtz operator V7 — k? in the half-space z < 0, find the 
three-dimensional Neumann GF. 


22.21 Using the integral form of the Schrédinger equation in three dimen- 
sions, show that an attractive delta-function potential V(r) = —Vod(r — a) 
does not have a bound state (EF < 0). Contrast this with the result of Exam- 
ple 21.4.1. 


22.22 By taking the Fourier transform of both sides of the integral form of 
the Schrédinger equation, show that for bound-state problems (E < 0), the 
equation in “momentum space” can be written as 


ies ( : ) fr ~qiq@d 
p —_ (273/22 K2 + p? p q q q, 


where k? = —2wE/h?. 


22.23 Write the bound-state Schrodinger integral equation for a non-local 
potential, noting that G(r, r’) = e~*"-¥! /|r — r'|, where x? = —2WE/h? 
and yu is the mass of the bound particle. The homogeneous solution is zero, 
as is always the case with bound states. 


(a) Assuming that the potential is of the form V(r, r’) = —g*7U(r)U(r), 
show that a solution to the Schrédinger equation exists iff 


pe, af ae 
d dr’ U(r) U(r’) = 1. 22.62 
2m h2 [. rf . Ir—r’| (r)U(r) ( ) 


(b) Taking U(r) =e ~°’/r, show that the condition in (22.62) becomes 


4m wg? 1 1 
ah? |(@+K«)2} 


(c) Since « > 0, prove that the equation in (b) has a unique solution only 
if g? > h?a/(4z 1), in which case the bound-state energy is 


‘i fa 2\ 1/2 2 
E= dla a}. 
2 ah 


22.24 Repeat calculations in Sects. 22.4.1 and 22.4.2 for m = 2. 


22.25 In this problem, the dimension m is three. 


(a) Derive the following identities: 


v[fo)-=4 ai+e(-), 
r r r* or r 


2 " 2 ! 
=28(t), Vs(ttr)=s’(ttr)+—8'(t +r), 
r 


where €(t) = 0(t) — 6(-1). 


22.6 Problems 


(b) Use the results of (a) to show that the GF [Eq. (22.47)] derived 
from the principal value of the w integration for the wave equation 
in three dimensions satisfies only the homogeneous PDE. Hint: Use 
V7(1/r) = 47 5(r). 


22.26 Calculate the retarded GF for the wave operator in two dimensions 
and show that it is equal to 


Ot 
Gr) = 1), 
Inv t? — r2 
Now use this result to obtain the GF for any even number of dimensions: 


n—-1 
Gen. n=O/ ! ~) : for n =m/2. 
f2—r2 


20 2nr or —r 


22.27 (a) Find the singular part of the retarded GF and the advanced GF for 
the wave equation in three dimensions using Eqs. (22.48) and (22.49). Hint: 
Ji /2(kr) = /2/akr sinkr. 

(b) Use (a) and Eq. (22.51) to show that 


li : : s+ j-8¢—7)] 
im = r)—d(t—r)]. 
e>0| [r2+(-it+6)?] [r?+ (it +€)?] r 

22.28 Show that the eigenfunction expansion of the GF for the Dirichlet 
BVP for the Laplacian operator in two dimensions for which the region of 
interest is the interior of a circle of radius a is 


’ 


2S en In(2Xnm) In (Xam) cosn(y — g! 
G(r ry=-=y > n nS nm) na nm) (y— ¢') 


2 2 
n=0m=1 In+1 (Xnm)Xim 


where €9 = + and €, = 1 for n > 1, and use has been made of Prob- 


2 
lem 15.39. 


22.29 Go back to Example 22.5.4, and 


(a) complete the calculations therein; 

(b) find the GF for the Laplacian with Dirichlet BCs on two concentric 
spheres of radii a and b, witha <b. 

(c) Consider the case where a — 0 and b > o and compare the result 
with the singular part of the GF for the Laplacian. 


22.30 Solve the Dirichlet BVP for the operator V* — k? in the region 0 < 
x <a,0<y<b,—c <z < ow. Hint: Separate the operator into Ly and Lp. 


22.31 Solve the problem of Example 22.5.1 using the separation of operator 
technique and show that the two results are equivalent. 


22.32 Use the operator separation technique to calculate the Dirichlet GF 
for the two-dimensional operator V2 — k? on the rectangle O<x<a,0< 
y <b. Also obtain an eigenfunction expansion for this GF. 
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22.33 Use the operator separation technique to find the three-dimensional 
Dirichlet GF for the Laplacian in a circular cylinder of radius a and height h. 


22.34 Calculate the singular part of the GF for the three-dimensional free 
Schrédinger operator 


22.35 Use the operator separation technique to show that 
(a) the GF for the Helmholtz operator V2 +k? in three dimensions is 


l 


Cc 
G(r.) =-ik Y> > jilkr-)hilkrs)¥im 0, 9) Yi, (0. 9’), 
1=0 m=-—1 


where rz (rs) is the smaller (larger) of r and r’ and j; and h; are the 
spherical Bessel and Hankel functions, respectively. No explicit BCs 
are assumed except that there is regularity at r = 0 and that G(r, r’) > 
0 for |r| > oo. 

(b) Obtain the identity 


l 


(oe) 
= ky) > jlkredhilkrs)¥im 0, 9)¥ jin (8, ¢’). 


4n|r—r’ 
| 1=0 m=—1 


eikir—r'| 


(c) Derive the plane wave expansion [see Eq. (19.46)] 


lee) I 
ote oo HME OP WinlO.0, 
1=0 m=—I 


where 6’ and g’ are assumed to be the angular coordinates of k. Hint: 
Let |r’| — oo, and use 


r-r 


jr—r' | =(r? 47? -2r-r')'? 


r/ 


and the asymptotic formula AY @) > (1/z)ef + E+D@/2)1 valid for 
large z. 


Part VII 
Groups and Their Representations 


Group Theory 23 


The tale of mathematics and physics has been one of love and hate, of har- 
mony and discord, and of friendship and animosity. From their simultane- 
ous inception in the shape of calculus in the seventeenth century, through 
an intense and interactive development in the eighteenth and most of the 
nineteenth century, to an estrangement in the latter part of the nineteenth 
and the beginning of the twentieth century, mathematics and physics have 
experienced the best of times and the worst of times. Sometimes, as in the 
case of calculus, nature dictates a mathematical dialect in which the narra- 
tive of physics is to be spoken. Other times, man, building upon that dialect, 
develops a sophisticated language in which—as in the case of Lagrangian 
and Hamiltonian interpretation of dynamics—the narrative of physics is set 
in the most beautiful poetry. But the happiest courtship, and the most exhila- 
rating relationship, takes place when a discovery in physics leads to a devel- 
opment in mathematics that in turn feeds back into a better understanding of 
physics, leading to new ideas or a new interpretation of existing ideas. Such 
a state of affairs began in the 1930s with the advent of quantum mechanics, 
and, after a lull of about 30 years, revived in the late 1960s. We are fortu- 
nate to be witnesses to one of the most productive collaborations between 
the physics and mathematics communities in the history of both. 

It is not an exaggeration to say that the single most important catalyst 
that has facilitated this collaboration is the idea of symmetry the study of 
which is the main topic of the theory of groups, the subject of this chapter. 
Although group theory, in one form or another, was known to mathemati- 
cians as early as the beginning of the nineteenth century, it found its way 
into physics only after the invention of quantum theory, and in particular, 
Dirac’s interpretation of it in the language of transformation theory. Eugene 
Wigner, in his seminal paper! of 1939 in which he applied group theoretical 
ideas to Lorentz transformations, paved the way for the marriage of group 
theory and quantum mechanics. Today, in every application of quantum the- 
ory, be it to atoms, molecules, solids, or elementary particles such as quarks 
and leptons, group-theoretical techniques are indispensable. 


'E.P. Wigner, On the Unitary Representations of the Inhomogeneous Lorentz Group, Ann. 
of Math. 40 (1939) 149-204. 
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23 Group Theory 
23.1 Groups 


The prototype of a group is a transformation group, the set of invertible map- 
pings of a set onto itself. Let us elaborate on this. First, we take mappings 
because they are the most general operations performed between sets. From 
a physical standpoint, mappings are essential in understanding the symme- 
tries and other basic properties of a theory. For instance, rotations and trans- 
lations are mappings of space. Second, the mappings ought to be on a single 
set, because we want to be able to compose any given two mappings. We 
cannot compose f : A > B and g: A > B, because, by necessity, the do- 
main of the second must be a subset of the image of the first. With three sets, 


and A Z B,B 46 , even if the composition f o g is defined, go f will not 
be. Third, we want to be able to undo the mapping. Physically, this means 
that we should be able to retrace our path to our original position in the set. 
This can happen only if all mappings of interest have an inverse. Finally, we 
note that composing a mapping with its inverse yields identity. Therefore, 
the identity map must also be included in the set of mappings. 

We shall come back to transformation groups frequently. In fact, almost 
all groups considered in this book are transformation groups. However, as 
in our study of vector spaces in Chap. 2, it is convenient to give a general 
description of (abstract) groups. 


Definition 23.1.1 A group is a set G together with an associative binary 
operation G x G > G called multiplication—and denoted generically by 
x—having the following properties: 


1. There exists a unique element’ e € G called the identity such that e » 
§=§xe=8. 
2. For every element g € G, there exists an element g~ 


verse of g, such that g* g-! =g-!«g=e. 


To emphasize the binary operation of a group, we designate it as (G, x). 


| called the in- 


Historical Notes 

Evariste Galois (1811-1832) was definitely not the stereotypically dull mathematician, 
quietly creating theorems and teaching students. He was a political firebrand whose life 
ended in a mysterious duel when he was only 21 years old. An ardent republican, he 
was in the unfortunate position of having Cauchy, an ardent royalist, as the only French 
mathematician capable of understanding the significance of his work. His professional 
accomplishments (fewer than 100 pages, much of which was published posthumously) 
received the attention they deserved many years later. It is truly sad to realize that for 
decades, work from the man credited with the foundation of group theory were lost to 
the world of mathematics. Galois’s early years were relatively happy. His father, a liberal 
thinker known for his wit, was director of a boarding school and later mayor of Bourg-la- 
Reine. Galois’s mother took charge of his early education. A stubborn, eccentric woman, 
she mixed classical culture with a fairly stern religious upbringing. 

The young Galois entered the College Louis-le-Grand in 1823, but found the harsh disci- 
pline imposed by church and political authorities difficult to bear. His interest in mathe- 
matics was sparked in class by Vernier, but Galois quickly tired of the elementary char- 
acter of the material, preferring instead to read the more advanced original works on his 


To distinguish between identities of different groups, we sometimes write eg for the 
identity of the group G. 


23.1 Groups 


own. After a flawed attempt to solve the general fifth-order equation, Galois submitted a 
paper to the Académie des Sciences in which he described the definitive solution with the 
aid of group theory, of which the young Galois can be considered the creator. However, 
this strong initial foray into the frontiers of mathematics was accompanied by tragedy and 
setback. A few weeks after the paper’s submission, his father committed suicide, which 
Galois felt was largely to be blamed on those who politically persecuted his father. A 
month later the young mathematician failed the entrance examination to the Ecole Poly- 
technique, largely due to his refusal to answer in the form demanded by the examiner. 
Galois did gain entrance to a less prestigious school for the preparation of secondary- 
school teachers. While there he read some of Abel’s results (published after Abel’s death) 
and found that they contained some of the results he had submitted to the Academy in- 
cluding the proof of the impossibility of solving quintics. Cauchy, assigned as the judge 
for Galois’s paper, suggested that he revise it in light of this new information. Galois in- 
stead wrote an entirely new manuscript and submitted it in competition for the grand prix 
in mathematics. Tragically, the manuscript was lost on the death of Fourier, who had been 
assigned to examine it, leaving Galois out of the competition. These events, fueled by a 
later, unfair dismissal of another of his papers by Poisson, seem to have driven Galois to- 
ward political radicalism and rebellion during the renewed turmoil then plaguing France. 
He was arrested several times for his agitations, although he continued work on mathe- 
matics while in custody. On May 30, 1832, he was wounded in a duel with an unknown 
adversary, the duel perhaps caused by an unhappy love affair. His funeral three days later 
sparked riots that raged through Paris in the days that followed. 

The delay in recognition of the true scope of Galois’s scant but amazing work stemmed 
partly from the originality of his ideas and the lack of competent local reviewers. Cauchy 
left France after seeing only the early parts of Galois’s work, and much of the rest re- 
mained unnoticed until Liouville prepared the later manuscripts for publication a decade 
after Galois’s death. Their true value wasn’t appreciated for another two decades. The 
young mathematician himself added to the difficulty by deliberately making his writing 
so terse that the “established scientists” for whom he had so much disdain could not un- 
derstand it. Those fortunate enough to appreciate Galois’s work found fertile ground in 
mathematical research, in such fundamental fields as group theory and modern algebra, 
for decades to come. 


If the underlying set G has a finite number of elements, the group is called 
finite, and its number of elements, denoted by |G|, is called the order of G. 
We can also have an infinite group whose cardinality can be countable or 
continuous. 

Given an element a € G, we write 


a® =axax---a, a Sa ea ee ea | 
ei tae! i iy 
k times m times 


and note that 


a'xa/ =a'*/  foralli, j €Z. 


Example 23.1.2 The following are examples of familiar sets that have 
group properties. 


(a) The set Z of integers under the binary operation of addition forms a 
group whose identity element is 0 and the inverse of n is —n. This 
group is countably infinite. 

(b) The set {—1, +1}, under the binary operation of multiplication, forms 
a group whose identity element is | and the inverse of each element is 
itself. This group is finite. 

(c) The set {—1,+1,—i, +7}, under the binary operation of multiplica- 
tion, forms a finite group whose identity element is 1. 


order of a group 
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(d) The set R, under the binary operation of addition, forms a group 
whose identity element is 0 and the inverse of r is —r. This group 
is uncountably infinite. 

(e) The set Rt (Q*) of positive real (rational) numbers, under the binary 
operation of multiplication, forms a group whose identity element is 
1 and the inverse of r is 1/r. This group is uncountably (countably) 
infinite. 

(f) The set C, under the binary operation of addition, forms a group whose 
identity element is 0 and the inverse of z is —z. This group is uncount- 
ably infinite. 

(g) The uncountably infinite set C — {0} of all complex numbers except 
0, under the binary operation of multiplication, forms a group whose 
identity element is 1 and the inverse of z is 1/z. 

(h) The uncountably infinite set V of vectors in a vector space, under the 
binary operation of addition, forms a group whose identity element is 
the zero vector and the inverse of |a) is —|a). 

(i) The set of invertible n x n matrices, under the binary operation of 
multiplication, forms a group whose identity element is the n x n unit 
matrix and the inverse of Ais A~!. This group is uncountably infinite. 


The reader is urged to verify that each set given above is indeed a group. 


In general, the elements of a group do not commute. Those groups whose 
elements do commute are so important that we give them a special name: 


Definition 23.1.3 A group (G, x) is called abelian or commutative if 
axb=bxa for alla, b € G. It is common to denote the binary operation of 
an abelian group by +. 


All groups of Example 23.1.2 are abelian except the last. 


Example 23.1.4 Let A be a vector potential that gives rise to a magnetic 
field B. The set of transformations of A that give rise to the same B is an 
abelian group. In fact, such transformations simply add the gradient of a 
function to A. The reader can check the details. 


The reader may also verify that the set of invertible mappings f : S — S, 
i.e., the set of transformations of S, is indeed a (nonabelian) group. If S has 
n elements, this group is denoted by S, and is called the symmetric group 
of S. S, is anonabelian (unless n < 2) finite group that has n! elements. An 
element g of S, is usually denoted by two rows, the top row being S itself— 
usually taken to be 1, 2,...,—and the bottom row its image under g. For 
example, g = (a) is an element of S4 such that g(1) = 2, g(2) =3, 
g(3) =4, and g(4) = 1. 

Consider two groups, the set of vectors in a plane ((x, y), +) and the set 
of complex numbers (C, +), both under addition. Although these are two 
different groups, the difference is superficial. We have seen similar differ- 
ences in disguise in the context of vector spaces and the notion of isomor- 
phism. The same notion applies to group theory: 


23.2 Subgroups 


Definition 23.1.5 Let (G,*) and (H,e) be groups. A map f : G > H is 
called a homomorphism if 


f(axb)= f(a)e f(b) Va,beG. 


An isomorphism is a homomorphism that is also a bijection. Two groups 
are isomorphic, denoted by G = H, if there is an isomorphism f : G—> H. 
An isomorphism of a group onto itself is called an automorphism. 


An immediate consequence of this definition is that f(eg) = ey and 
f(g~) =Lf (g)}7! (see Problem 23.9). 


Example 23.1.6 Let G be any group and {1} the multiplicative group con- 
sisting of the single number |. It is straightforward to show that f : G > 
{1}, given by (the only function available!) f(g) = 1 for all g € G is a ho- 
momorphism. This homomorphism is called the trivial (or sometimes, sym- 
metric) homomorphism. 


The establishment of isomorphism f : R? > C between ((x, y), +), and 
(C, +) is trivial: Just write f(x, y) =x +iy. A less trivial isomorphism is 
the exponential map, exp: (R, +) — (R*,-). The reader may verify that 
this is a homomorphism (in particular, it maps addition to multiplication) 
and that it is one-to-one. 

We have noted that the set of invertible maps of a set forms a group. 
A very important special case of this is when the set is a vector space V and 
the maps are all linear. 


Box 23.1.7 The general linear group of a vector space V, denoted 
by GL(V), is the set of all invertible endomorphisms of V. In particu- 
lar, when V = C", we usually write GL(n, C) instead of GL(C") with 
similar notation for R. 


It is sometimes convenient to display a finite group G = {g;} Nd as a 
|G| x |G| table, called the group multiplication table, in which the inter- 
section of the ith row and jth column is occupied by g; * g;. Because of its 
trivial multiplication, the identity is usually omitted from the table. 


23.2 Subgroups 


It is customary to write ab instead of a x b. We shall adhere to this conven- 
tion, but restore the * as necessary to avoid any possible confusion. 


Definition 23.2.1 A subset S of a group G is a subgroup of G if it is a 
group in its own right under the binary operation of G, i.e., if it contains the 
inverse of all its elements as well as the product of any pair of its elements. 
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It follows from this definition that e € S. It is also easy to show that the 
intersection of two subgroups is a subgroup (Problem 23.2). 


Example 23.2.2 (Examples of subgroups) 


For any G, the subset {e}, consisting of the identity alone, is a subgroup 
of G called the trivial subgroup of G. 

(Z, +) is a subgroup of (R, +). 

The set of even integers (but not odd integers) is a subgroup of (Z, +). 
In fact, the set of all multiples of a positive integer m, denoted by Zm, 
is a subgroup of Z. It turns out that all subgroups of Z are of this form. 
The subset of GL(n, C) consisting of transformations that have unit 
determinant is a subgroup of GL(n, C) because the identity transfor- 
mation has unit determinant, the inverse of a transformation with unit 
determinant also has unit determinant, and the product of two transfor- 
mations with unit determinants has unit determinant. 


Box 23.2.3 The subgroup of GL(n, C) consisting of elements having 
unit determinant is denoted by SL(n, C) and is called the special lin- 
ear group. 


The set of unitary transformations of C”, denoted by U(n), is a sub- 
group of GL(n, C) because the identity transformation is unitary, the 
inverse of a unitary transformation is also unitary, and the product of 
two unitary transformations is unitary. 


Box 23.2.4 The set of unitary transformations U (n) is a subgroup of 
GL(n, C) and is called the unitary group. Similarly, the set of orthog- 
onal transformations of IR" is a subgroup of GL(n, R). It is denoted 
by O(n) and called the orthogonal group. 


Each of these groups has a special subgroup whose elements have unit 
determinants. These are denoted by SU(n) and SO(n), and called spe- 
cial unitary group and special orthogonal group, respectively. The 
latter is also called the group of rigid rotations of R”. 

Let x, y € R”, and define an inner product on R” by 


KX YS= XY — +++ —XpVp + XptiYp+i H+ + Ann. 


Denote the subset of GL(n, R) that leaves this inner product invariant 
by? O(p,n — p). Then O(p,n — p) is a subgroup of GL(n, R). The 
set of linear transformations among O(p,n — p) that have determinant 


3The reader is warned that what we have denoted by O(p,n — p) is sometimes denoted 
by other authors by O(n — p, p) or O(n, p) or O(p,n). 


23.2 Subgroups 


1 is denoted by SO(p,n — p). The special case of p = 0 gives us the 
orthogonal and special orthogonal groups.* When n = 4 and p = 3, we 
get the inner product of the special theory of relativity, and O(3, 1), 
the set of Lorentz transformations, is called the Lorentz group. If one 
adds translations of R* to O(3, 1), one obtains the Poincaré group, 
P(3, 1). 

7. Let x,y € R2”, and J the 2n x 2n matrix (= ai where 1 is then xn 
unit matrix. The subset of GL(2n, R) that leaves x'Jx, called an anti- 
symmetric bilinear form, invariant is a subgroup of GL(2n, R) called 
the symplectic group and denoted by Sp(2n, R). As we shall see in 
Chap. 28, the symplectic group is fundamental in the formal treatment 
of Hamiltonian mechanics. 

8. Let S be a subgroup of G and g € G. Then it is readily shown that the 
set 


g ‘Sg= {g—'sg |s€ Ss} 


is also a subgroup of G, called the subgroup conjugate to S under g, 
or the subgroup g-conjugate to S. 


When discussing vector spaces, we noted that given any subset of a vector 
space, one could construct a subspace out of it by taking all possible linear 
combinations (natural operations of the vector space) of the vectors in the 
subset. We called the subspace thus obtained the span of the subset. The 
same procedure is applicable in group theory as well. If S is a subset of a 
group G, we can generate a subgroup out of S by collecting all possible 
products and inverses (natural operations of the group) of the elements of S. 
The reader may verify that the result is indeed a subgroup of G. 


Definition 23.2.5 Let S be a subset of a group G. The subgroup generated 
by S, denoted by (S), is the union of S§ and all inverses and products of the 
elements of S. 


In the special case for which S = {a}, a single element, we use (a) instead 
of ({a}) and call it the cyclic subgroup generated by a. It is simply the 
collection of all integer powers of a. 


Definition 23.2.6 Let G be a group and a,b € G. The commutator of a 
and b, denoted by [a, b], is 


[a,b] =aba~'b7!. 


The subgroup (UU) a.beGl4, b]) generated by all commutators of G is 
called the commutator subgroup of G. The reader may verify that a group 
is abelian if and only if its commutator subgroup is the trivial subgroup, i.e., 
consists of only the identity element. 


“Tt is customary to write O(n) and SO(n) for O(0,) and SO(0, n). 
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Definition 23.2.7 Let x € G. The set of elements of g that commute with x, 
denoted by Cg(x), is called the centralizer of x in G. The set Z(G) of 
elements of a group G that commute with all elements of G is called the 
center of G. 


Theorem 23.2.8 Cg(x) is a subgroup of G and Z(G) is an abelian sub- 
group of G. Furthermore, G is abelian if and only if Z(G) = G. 


Proof Proof is immediate from the definitions. 


Definition 23.2.9 Let G and H be groups and let f : G > H be a homo- 
morphism. The kernel of f is 


ker f ={xEG| f(x) =ee A}. 


The reader may check that ker f is a subgroup of G, and f(G) is a sub- 
group of H. These are the analogues of the same concepts encountered in 
vector spaces. In fact, if we treat a vector space as an additive group, with 
the zero vector as identity, then the above definition coincides with that of 
linear mappings and vector spaces. 

Carrying the analogy further, we recall that given two subspaces U and 
W of a vector space V, we denote by U + W all vectors of V that can be 
written as the sum of a vector in U and a vector in W. There is a similar 
concept in group theory that is sometimes very useful. 


Definition 23.2.10 Let S and T be subsets of a group (G, x). Then one 
defines the product of these subsets as 


S«T ={sxt|seSandteT}. 
In particular, if T consists of a single element ft, then 
Sxt={sxt|s eS}. 


As usual, we shall drop the * and write ST and St. If S is a subgroup, then 
St is called a right coset? of S in G. Similarly, tS is called a left coset of S 
in G. In either case, ¢ is said to represent the coset. 


Example 23.2.11 Let G = R? treated as an additive abelian group, and let 
S be a plane through the origin. Then ¢ + S is S if t € S (see Problem 23.5); 
otherwise, it is a plane parallel to S. In fact, tf + S is simply the translation 
of all points of S by fr. 


Theorem 23.2.12 Any two right (left) cosets of a subgroup are either dis- 
joint or identical. 


5Some authors switch our right and left in their definition. 


23.2 Subgroups 


Proof Let S be a subgroup of G and suppose that x €¢ Sa M Sb. Then 
X = 51a = Syb with 51, 5. € S. Hence, ab~! = sys € S. By Problem 23.6, 
Sa = Sb. The left cosets can be treated in the same way. 


A more “elegant” proof starts by showing that an equivalence relation 
can be defined on G by 


amb <> ab'eSs 


and then proving that the equivalence classes of this relation are cosets of S. 

One interpretation of Theorem 23.2.12 is that a and b belong to the same 
right coset of S if and only if ab~! € S. A second interpretation is that a 
coset can be represented by any one of its elements (why?). 

All cosets (right or left) of a subgroup S have the same cardinality as S 
itself. This can readily be established by considering the map @: S > Sa 
(@: S > aS) with $(s) = sa (6(s) =as) and showing that ¢ is bijective. 

There are many instances both in physics and mathematics in which a 
collection of points of a given set represent a single quantity. For example, 
it is not simply the set of ratios of integers that comprise the set of rational 
numbers, but the set of certain collections of such ratios: The rational num- 
ber 5 represents 5 i, 2, etc. Similarly, a given magnetic field represents an 
infinitude of vector potentials each differing by a gradient from the others, 
and a physical state in quantum mechanics is an infinite number of wave 
functions differing from one another by a phase. 

With the set of cosets constructed above, it is natural to ask whether 
they could be given an algebraic structure.° The most natural such struc- 
ture would clearly be that of a group: Given aS and DS define their product 
as abS. Would this operation turn the set of (left) cosets into a group? The 
following argument shows that it will, under an important restriction. 

It is clear that the identity of such a group would be S itself. It is equally 
clear that we should have (b~!S)(bS) = S, so that (b~! Sb) S = S. It follows 
from Problem 23.5 that we must have b~! Sb C S for all b € G. Now replace 
b with b—! and note that bSb~! C S as well. Let s be an arbitrary element 
of S. Then bsb~! = 5’ for some s’ € S, and s = b~'s'b € b~' Sb. It follows 
that S c b~! Sb for all b € G, and, with the reverse inclusion derived above, 
that S = b~!Sb. This motivates the following definition. 


Definition 23.2.13 A subgroup N of a group G is called normal if N = 
g—'! Ng (equivalently if Ng = gN) for all g €G. 


The preceding argument shows that the set of cosets (no specification 
is necessary since the right and left cosets coincide) of a normal subgroup 
forms a group: 


©The set of cosets of a subgroup is the analog of factor space of a subspace of a vector 
space (Sect. 2.1.2) and factor algebra of a subalgebra of an algebra (Sect. 3.2.1). We have 
seen that, while a factor space of any subspace can be turned into a vector space, that is 
not the case with an algebra: the subalgebra must be an ideal of the algebra. There is a 
corresponding restriction for the subgroup. 
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Theorem 23.2.14 [f N is a normal subgroup of G, then the collection of 
all cosets of N, denoted by G/N, is a group, called the quotient group or 
factor group of G by N. 


We note that the only subgroup conjugate to a normal subgroup N is N 
itself (see Example 23.2.2), and that all subgroups of an abelian group are 
automatically normal. 


Example 23.2.15 Let G = R? and let S be a plane through the origin as in 
Example 23.2.11. Since G is abelian, S is automatically normal, and G/S 
is the set of planes parallel to S. Let é,, be a normal to S. Then it is readily 
seen that 


G/S={ré, + S|r eR}. 


We have picked the perpendicular distance between a plane and S (with 
sign included) to represent that plane. The reader may check that the quo- 
tient group G/S is isomorphic to R. Identifying § with R?, we can write 
IR3 /R? = R. The cancellation of exponents is quite accidental here! 

Let G = Z and S = Zm, the set of multiples of the positive integer m. 
Since Z is abelian, Zm is normal, and Z/Zm is indeed a group, a typical 
element of which looks like k + mZ. By adding (or subtracting) multiples 
of m to k, and using mj + mZ = mZ (see Problem 23.5), we can assume 
that 0 < k <m —1. It follows that Z/Zm is a finite group. Furthermore, 


(ky + mZ) + (kg +mZ)=ki +hoa+mZ=k+mZ, 


where k is the remainder after enough multiples of m have been subtracted 
from k, + kz. One writes kj + kz =k (mod m). The coset k + mZ is some- 
times denoted by k and the quotient group Z/Zm by Zn: 


Zm ={0,1,2,...,m— 1}. 


Zm is a prototype of the finite cyclic groups. It can be shown that every 
cyclic group of order m is isomorphic to Z,, a generator of which is 1 (recall 
that the binary operation is addition for Zm). 


Theorem 23.2.16 (First isomorphism theorem) Let G and H be groups 
and f :G— H a homomorphism. Then ker f is a normal subgroup of G, 
and G/ker f is isomorphic to f (G). 


Proof We have already seen that ker f is a subgroup of G. To show that it 
is normal, let g € G and x € ker f. Then 


f(gxg') = f(g) f@) f(g!) = f(gen f(g!) = f(g) f(g’) 
= f(eg') = flec) =en- 
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It follows that gxg~! € ker f. Therefore, ker f is normal. We leave it 
to the reader to show that ¢: G/ker f > f(G) given by ¢(g[ker f]) = 
é([ker f]g) = f(g) is an isomorphism.’ 


Example 23.2.17 The special linear group of V is a normal subgroup of 
the general linear group of V. To see this, note that det: GL(V) > Rt isa 
homomorphism whose kernel is SL(V). 


Definition 23.2.18 Let x ¢ G. A conjugate of x is an element y of G that 
can be written as y = gxg~! with g € G. The set of all elements of G con- 
jugate to one another is called a conjugacy class. The ith conjugacy class is 
denoted by K;. 


One can check that “x is conjugate to y” is an equivalence relation whose 
classes are the conjugacy classes. In particular, two different conjugacy 
classes are disjoint. One can also show that each element of the center of 
a group constitutes a class by itself. In particular, the identity in any group is 
in a class by itself, and each element of an abelian group forms a (different) 
class. 

Although a normal subgroup N contains the conjugate of each of its ele- 
ments, N is not a class. The class containing any given element of N will be 
only a proper subset of N (unless N is trivial). The characteristic feature of a 
normal subgroup is that it contains the conjugacy classes of all its elements. 
This is not shared by other subgroups, which, in general, contain only the 
trivial class of the identity element. 


Example 23.2.19 Consider the group SO(3) of rotations in three dimen- 
sions. Let us denote a rotation by Ra(@), where é is the direction of the axis 
of rotation and 6@ is the angle of rotation. A typical member of the conjugacy 
class of Ra(@) is RRa(@)R~!, where R is some rotation. Let é’ = Ré be the 
vector obtained by applying the rotation R on é, and note that 


RRa(0)R-'@ = RRg(O)R-| RE = RRA = RE=E, 


where we used the fact that Rgé = € because a rotation leaves its axis un- 
changed. This last statement, applied to the equation above, also shows that 
RR;(0)R7! is a rotation about é’. Problem 23.18 establishes that the an- 
gle of rotation associated with RRag(@)R~! is 8. We can summarize this as 
RRg(0)R7! = Ry (6). 


The result of this example is summarized as follows: 


Box 23.2.20 All rotations having the same angle belong to the same 
conjugacy class of the group of rotations in three dimensions. 


7Compare this theorem with the set-theoretic result obtained in Chap. 1 where the map 
X/>—> f(X) was shown to be bijective if >< is the equivalence relation induced by /f. 
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23.2.1 Direct Products 


The resolution of a vector space into a direct sum of subspaces was a useful 
tool in revealing its structure. The same idea can also be helpful in studying 
groups. Recall that the only vector common to the subspaces of a direct sum 
is the zero vector. Moreover, any vector of the whole space can be written as 
the sum of vectors taken from the subspaces of the direct sum. Considering 
a vector space as a (abelian) group, with zero as the identity and summation 
as the group operation, leads to the notion of direct product. 


Definition 23.2.21 A group G is said to be the direct product of two of its 
subgroups H, and A>, and we write G = Hy x Ao, if 


1. all elements of H; commute with all elements of H; 
2. the group identity is the only element common to both Hj and Ad; 
3. every g © Gcan be written as g = hyh2 with hy € Ay and hz € Ad. 


It follows from this definition that h; and h2 are unique, and Hy and 
Hp are normal. This kind of direct product is sometimes called internal, 
because the “factors” H; and H2 are chosen from inside the group G itself. 
The external direct product results when we take two unrelated groups and 
make a group out of them: 


Proposition 23.2.22 Let G and H be groups. The Cartesian product G x H 
can be given a group Structure by 


(g,h) * (g’,h') = (gg’, hh’). 


With this multiplication, G x H is called the external direct product of G 
and H. Furthermore, G = G x {ey}, H = {eg} x H,Gx HZ=HxG, 
and to within these isomorphisms, G x H is the internal direct product of 
G x {ey} and {eg} x H. 


The proof is left for the reader. 


Historical Notes 

Niels Henrik Abel (1802-1829) was the second of seven children, son of a Lutheran 
minister with a small parish of Norwegian coastal islands. In school he received only av- 
erage marks at first, but then his mathematics teacher was replaced by a man only seven 
years older than Abel. Abel’s alcoholic father died in 1820, leaving almost no financial 
support for his young prodigy, who became responsible for supporting his mother and 
family. His teacher, Holmboe, recognizing his talent for mathematics, raised money from 
his colleagues to enable Abel to attend Christiania (modern Oslo) University. He entered 
the university in 1821, 10 years after the university was founded, and soon proved him- 
self worthy of his teacher’s accolades. His second paper, for example, contained the first 
example of a solution to an integral equation. 

Abel then received a two-year government travel grant and journeyed to Berlin, where 
he met the prominent mathematician Crelle, who soon launched what was to become 
the leading German mathematical journal of the nineteenth century, commonly called 
Crelle’s Journal. From the start, Abel contributed important papers to Crelle’s Journal, 
including a classic paper on power series, the scope of which clearly reflects his desire 
for stringency. His most important work, also published in that journal, was a lengthy 
treatment of elliptic functions in which Abel incorporated their inverse functions to show 
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that they are a natural generalization of the trigonometric functions. In later research in 
this area, Abel found himself in stiff competition with another young mathematician, 
K.G.J. Jacobi. Abel published some papers on functional equations and integrals in 1823. 
In it he gives the first solution of an integral equation. In 1824 he proved the impossibility 
of solving algebraically the general equation of the fifth degree and published it at his 
own expense hoping to obtain recognition for his work. 

Despite his proven intellectual success, Abel never achieved material success, not even 
a permanent academic position. In December of 1828, while traveling by sled to visit 
his fiancé for Christmas, Abel became seriously ill and died a couple of months later. 
Ironically, his death from tuberculosis occurred two days before Crelle wrote with the 
happy news of an appointment for Abel at a scientific institute in Berlin. In Abel’s eulogy 
in his journal, Crelle wrote: 

“He distinguished himself equally by the purity and nobility of his character and by a rare 
modesty which made his person cherished to the same degree as was his genius.” 


23.3 Group Action 


The transformation groups introduced at the beginning of this chapter can 
be described in the language of abstract groups. 


Definition 23.3.1 Let G be a group and M a set. The left action of G on 
M isamap ®:G x M —> M such that 


1. @&(e,m)=m forallme M; 
2. P(gig2,m) = (gi, P(g2,m)). 


One usually denotes ®(g,m) by g-m or more simply by gm. The right 
action is defined similarly. A subset N C M is called left (right) invariant 
ifg-meN (m-geEN) forall g ¢ G, wheneverme N. 


Example 23.3.2 If we define f, :M— M by f,(m) = ®(g,m) =g-m, 
then f, is recognized as a transformation of M. The collection of such trans- 
formations is a subgroup of the set of all transformations of M. Indeed, the 
identity transformation is simply fe, the inverse of fg is f,-1, and the (asso- 
ciative) law of composition is fg, o fg, = fz;g,. There is a general theorem 
in group theory stating that any group is isomorphic to a subgroup of the 
group of transformations of an appropriate set. 


Definition 23.3.3 Let G act on M and let m € M. The orbit of m, denoted 
by Gm, is 


Gm = {x € M| x = gm for some g € G}. 


The action is called transitive if Gm = M. The stabilizer of m is G,, = 
{g € G| gn =m}. The group action is called free if G,, = {e} for all m € 
M; it is called effective if gm =m for all m € M implies that g =e. 


The reader may verify that the orbit Gm is the smallest invariant subset 
of M containing m, and that 
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Box 23.3.4 The stabilizer of m is a subgroup of G, which is some- 
times called the little group of G atm. 


Remark 23.3.1 Think of the action of G on M as passing from one point 
of M to another. An element g of G “transits” a region of M to take m € M 
to gm € M. The action is therefore, transitive, if G can transit (pass across) 
all of M, i.e., G can connect any two points of M. 

If you think of G,, as those elements of G that are confined to (stuck, or 
imprisoned at) m, then a “free” action of G does not allow any point of M 
to imprison any subset of G. 

Any g € G such that gm = m for all m € M has no “effect” on M. There- 
fore, this g is “ineffective” in its action on M. For G to act “effectively”, it 
should not have any “ineffective” member. 

A transitive action is characterized by the fact that given any two points 
m,,mz2 € M, one can find a g € G such that m2 = gm . In general, there 
may be several gs connecting m, to m2. If there are g, g’ € G such that 
m2 = gm, = g/m, then 


gm =em > m= g lg'm. 


If we want g to be unique for all m , and mz, then the group action must be 
free. 

By definition, the orbits of a group G in M are disjoint and their union 
is the entire M. Another way of stating the same thing is to say that G 
partitions M into orbits. It should be obvious that the action of G on any 
orbit is transitive. 


From the remarks above, we conclude that 


Box 23.3.5 Two points of an orbit are connected by a unique element 
of G iff G acts freely on the orbit. 


Example 23.3.6 Let M = R* and G = SO(2), the planar rotation group. 
The action is rotation of a point in the plane about the origin by an angle @. 
The orbits are circles centered at the origin. The action is effective but not 
transitive. The stabilizer of every point in the plane is {e}, except the origin, 
for which the whole group is the stabilizer. Since the stabilizer at the origin 
is not {e}, the group action is not free. 

Let M = S!, the unit circle, and G = SO(2), the rotation group in two 
dimensions. The action is displacement of a point on the circle. There is 
only one orbit, the entire circle. The action is effective and transitive. The 
stabilizer of every point on the circle is {e}; therefore, the action is free as 
well. 


23.4 The Symmetric Group S, 


Let M = G, a group, and let a (proper) subgroup H act on G by left 
multiplication. The orbits are right cosets Hg of the subgroup. The action is 
effective but not transitive. The stabilizer of every point in the group is {e}; 
hence the action is free. 

Let M =R U {ov}, the set of real numbers including “the point at infin- 
ity”. Define an action of SL(2, R) on M by 


a b ax+c 
x= “x= , 
- c d bx +d 


This is a group action with a law of multiplication identical to the matrix 


multiplication. The action is transitive, but neither effective nor free. Indeed, 
the reader is urged to show that 


a b ; 1 0 —-1 0O 
gra (é 4) ea it e=(j ‘| or zc 5) 


making the group action not effective. Furthermore, for every x ¢ M 


a b 
ae ee +(d—a)x ) 


making the group action not free. 

Let M be a set and H the group of transformations of M. Suppose that 
there is ahomomorphism f : G — H from a group G into H. Then there is 
anatural action of G on M given by g-m =[f (g)](@m). The homomorphism 
f is sometimes called a realization of G. 


23.4 The Symmetric Group S, 


Because of its primary importance as the prototypical finite group, and be- 

cause of its significance in quantum statistics, the symmetric (or permuta- 

tion) group is briefly discussed in this section. It is also used extensively in 

the theory of representation of the general linear group and its subgroups. 
A generic permutation z of n numbers is shown as 


1 Di’ hue i | 
= : 23.1 
‘ (ed m2) RG) + 50) ree 
Because the mapping is bijective, no two elements can have the same image, 
and z(1), 7(2),..., a(n) exhaust all the elements in the set fy as 


We can display the product 22 o 2; of two permutations using 7 0 
(i) = 12(7(i)). For instance, if 


123 4 123 4 
m=(j 4] ;) and m=(j 4 3 Di (23.2) 
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Table 23.1 Group multiplication table for 53 


e 702 103 14 Ts 16 
9) e 15 16 13 14 
13, IT6 e U5 4 2 
14 1s 16 e 1) 703 
15 4 12 TU3, Il e 

6 13, 4 2 e 5 


then the product 72 0 7 takes | to 3, etc., because 72 0771 (1) = m2(71(1)) = 
72(3) = 3, etc. We display 2 0 77 as 


ee ee 
Sees he. fo a 


Example 23.4.1 Let us construct the multiplication table for S3. Denote the 
elements of $3 as follows: 


yt oo fl 2 3 ft o3 
NG Oo eye ME Nac eye | Na ps 
ft, & 3 fl 2 3 fl 2 3 
= og ge R= eg pe MO eg the 


We give only one sample evaluation of the entries and leave the 
straightforward—but instructive—calculation of the other entries to the 
reader. Consider 74 0 m5, and note that 25(1) = 3 and 74(3) = 2; so 
m4 0 75(1) = 2. Similarly, 24 0 75(2) = | and m4 0 w5(3) = 3. Thus 


1 2 3 
T4075 = 21 3 = 72. 


The entire multiplication table is given in Table 23.1. 


Note that both the rows and columns of the group multiplication table 
include all elements of the group, and no element is repeated in a row or a 
column. This is because left-multiplication of elements of a group by a sin- 
gle fixed element of the group simply permutes the group elements. Stated 
differently, the left multiplication map Ly : G > G, given by Lg(x) = gx, 
is bijective, as the reader may verify. 

Because we are dealing with finite numbers, repeated application of a 
permutation to an integer in the set {i}’_, eventually produces the initial 
integer. This leads to the following definition. 


Definition 23.4.2 Let z € S,, i € {1,2,...,n}, and let r be the smallest 
positive integer such that 2’(i) =i. Then the set of r distinct elements 
{a* (y= is called a cycle of z of length r or an r-cycle generated by i. 


Start with 1 and apply z to it repeatedly until you obtain | again. The col- 
lection of elements so obtained forms a cycle in which | is contained. Then 
we select a second number that is not in this cycle and apply z to it repeat- 
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edly until the original number is obtained again. Continuing in this way, we 
produce a set of disjoint cycles that exhausts all elements of {1,2,..., 7}. 


Proposition 23.4.3 Any permutation can be broken up into disjoint cycles. 


It is customary to write elements of each cycle in some specific order 
within parentheses starting with the first element, say 7, on the left, then 
m(i) immediately to its right, followed by 2*(i), and so on. For example, 
the permutations zr, and 72 of Eq. (23.2) and their product have the cycle 
structures 71 = (13)(24), m2 = (124)(3), and m2 o 2, = (132) (4), respec- 
tively. 


Example 23.4.4 Let 71, 22 € Sg be given by 


_f1 23 4 5 67 8 

ce a ae a ee a 

ma(i 234567 8 

oN S68 1 a 3" 
The reader may verify that 

expt 2 2.8 3.6 7 8 

EEE NG i AY Gs a 


and that 


wt = (1374) (25) (68), 2 = (125) (36748), 


m2 0 1 = (16342)(5)(78). 
In general, permutations do not commute. The product in reverse order is 
: *) = (15387) (2) (46), 


which differs from 72 0 71. However, note that it has the same cycle structure 
as 72 0 71, in that cycles of equal length appear in both. This is a general 
property of all permutations. 


Definition 23.4.5 If a € S, has a cycle of length r and all other cycles of cyclic permutations 
z have only one element, then z is called a cyclic permutation of length r. defined 


It follows that zr2 € Sq as defined earlier is a cyclic permutation of length 


3. Similarly, 
__ 12 3 4 5 6 
~\6 213 5 4 


is a cyclic permutation of length 4 (verify this). 


Definition 23.4.6 A cyclic permutation of length 2 is called a transposi- transpositions defined 
tion. 
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A transposition (ij) simply switches i and j. 


Example 23.4.7 Products of (not necessarily disjoint) cycles may be asso- 
ciated with a permutation whose action on i is obtained by starting with 
the first cycle (at the extreme right), locating the first occurrence of i, and 
keeping track of what each cycle does to it or its image under the pre- 
ceding cycle. For example, let 7, € Sg be given as a product of cycles by 
wy = (143)(24)(456). To find the permutation, we start with | and follow 
the action of the cycles on it, starting from the right. The first and second 
cycles leave | alone, and the last cycle takes it to 4. Thus, 2; (1) = 4. For 2 
we note that the first cycle leaves it alone, the second cycle takes it to 4, and 
the last cycle takes 4 to 3. Thus, 71 (2) = 3. Similarly, 71 (3) = 1, 71 (4) =5, 
m1 (5) = 6, and 71 (6) = 2. Therefore, 


_f1 2 3 4 5 6 
™*\4 3°15 6 2) 
We note that zr, is a cyclic permutation of length 6. 


It is left to the reader to show that the permutation zz € S5 given by the 
product 2 = (13)(15)(12) (14) is cyclic: 22 = (14253). 


The square of any transposition is the identity. Therefore, we can include 
it in any product of permutations without changing anything. 


Proposition 23.4.8 An r-cycle (i,,i2,...,i-) can be decomposed into the 
product of r — | transpositions: 


(ii, i2,...,i-) = (iy) Gii--1) +++ Gis) Giz). 


Proof The proof involves keeping track of what happens to each symbol 
when acted upon by the RHS and the LHS and showing that the two give 
the same result. This is left as an exercise for the reader. 


Although the decomposition of Proposition 23.4.8 is not unique, it can be 
shown that the parity of the decomposition (whether the number of factors 
is even or odd) is unique. For instance, it is easy to verify that 


1 
———— 
(1234) = (14) (13)(12) = (14) 64) G4) (23) (12) (12) (23) (13) (12). 
Sa Se ——_ 
1 1 


That is, (1234) is written as a product of 3 or 9 transpositions, both of which 
are odd. 

We have already seen that any permutation can be written as a product of 
cycles. In addition, Proposition 23.4.8 says that these cycles can be further 
broken down into products of transpositions. This implies the following (see 
[Rotm 84, p. 38]): 


Proposition 23.4.9 Any permutation can be decomposed as a product of 
transpositions. The parity of the decomposition is unique. 
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Definition 23.4.10 A permutation is even (odd) if it can be expressed as a even and odd 


product of an even (odd) number of transpositions. 


The parity of a permutation can be determined from its cycle structure 
and Proposition 23.4.8. 

The reader may verify that the mapping from S,, to the multiplicative 
group of {+1, —1} that assigns +1 to even permutations and —1 to odd per- 
mutations is a group homomorphism. It follows from the first isomorphism 
theorem (Theorem 23.2.16) that 


Box 23.4.11 The set of even permutations, denoted by Ay, forms a 
normal subgroup of Sy. 
This homomorphism is usually denoted by €. We therefore define 


(x) +1 if is even, (23.3) 
E(w) =e, = " 
"")-1 if is odd. 


Sometimes 5(z) or 6, as well as sgn(zr) is also used. The symbol, €;,;,._ j,, 
used in the definition of determinants, is closely related to €(z). In fact, 


En (1) (2)..0(n) = €x- 


-l 
Suppose 7, 0 € S,, and note that® o (i) Ss ie> (i) t =e oom(i),1.e., 
the composite 0 om o ao! of the three permutations takes o (i) too om(i). 
This composite can be thought of as the permutation obtained by applying 


o to the two rows of z = er a ” nin): 
Pre o(1) o(2) +++ a(n) 
~\oor(1) oon(2) +--+ con(n))’ 


In particular, the cycles of o ot oo~! are obtained by applying o to the 
symbols in the cycles of . Since o is bijective, the cycles so obtained will 
remain disjoint. It follows that o o 2 oa~!, a conjugate of z, has the same 
cycle structure as zr itself. In fact, we have the following: 


Theorem 23.4.12 Two permutations are conjugate if and only if they have 
the same cycle structure. 


To find the distinct conjugacy classes of S,, one has to construct dis- 
tinct cycle structures of S,. This in turn is equivalent to partitioning the 
numbers from | to 7 into sets of various lengths. Let vz be the number of 
k-cycles in a permutation. The cycle structure of this permutation is denoted 


8Recall from Chap. | that x +t y means y= f(x). 
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by (1, 2", ...,”°"). Since the total number of symbols is n, we must have 
rai kv =n. Defining Aj = Dop—; vk, We have 


tip. tee dg Soe ee (23.4) 


The splitting of n into nonnegative integers (A;,A2,..., An) as in Eq. (23.4) 
is called a partition of n. There is a 1-1 correspondence between parti- 
tions of n and the cycle structure of S,. We saw how v,’s gave rise to i’s. 
Conversely, given a partition of n, we can construct a cycle structure by 
Ve = Ax — Ag. For example, the partition (32000) of Ss corresponds to 
vy =3-2=1, 2 =2—0=2,i1€., one 1-cycle and two 2-cycles. One usu- 
ally omits the zeros and writes (32) instead of (32000). When some of the 
A’s are repeated, the number of occurrences is indicated by a power of the 
corresponding 4; the partition is then written as (}', “5°, ..., 4r"), where 
it is understood that A; through A;,, have the common value j11, etc. For ex- 
ample, (32 1) corresponds to a partition of 7 with Ay = 3, Az =3, and A3 = 1. 
The corresponding cycle structure is vj} = 0, v2 = 2, and v3 = 1, 1.e., two 2- 
cycles and one 3-cycle. The partitions of length 0 are usually ignored. Since 
>¥° A; =n, no confusion will arise as to which symmetric group the partition 
belongs to. Thus (32000) and (332000) are written as (32) and (372), and it 
is clear that (32) belongs to Ss5 and (372) to Sg. 


Example 23.4.13 Let us find the different cycle structures of $4. This cor- 
responds to different partitions of 4. We can take A; = 4 and the rest of the 
A’s zero. This gives the partition (4). Next, we let A; = 3; then Az must be 1, 
giving the partition (31). With A; = 2, Az can be either 2 or 1. In the latter 
case, 43 must be | as well, and we obtain two partitions, (27) and (217). 
Finally, if A; = 1, all other nonzero 4’s must be 1 as well (remember that 
Ax = Ax+1). Therefore, the last partition is of the form (14). We see that there 
are 5 different partitions of 4. It follows that there are 5 different conjugacy 
classes in Sq. 


23.5 Problems 


23.1 Let S be a subset of a group G. Show that S is a subgroup if and only 
if ab! € S whenever a,beS. 


23.2 Show that the intersection of two subgroups is a subgroup. 


23.3 Let X be a subset of a group G. A word on X is an element w of G 
of the form 


— 701 2... On 
W=X) Xo Xn 


where x; € X and e; = +1. Show that the set of all words on X is a subgroup 
of G. 


23.4 Let [a, b] denote the commutator of a and b. Show that 


23.5 Problems 


(a) [a,b]~' =[b, a], 

(b) [a,a] =e for alla e€ G, and 

(c) ab=[a, b]ba. It is interesting to compare these relations with the fa- 
miliar commutators of operators. 


23.5 Show that if S is a subgroup, then S? = SS = S, and tS = S if and 
only if t €¢ S. More generally, TS = S if and only if T C S. 


23.6 Show that if S is a subgroup, then Sa = Sb if and only if ba~! € § 
and ab~! € S (aS =DS if and only if a~'b € S and b~!a € S). 


23.7 Let S be a subgroup of G. Show that ap b defined by ab~! € S is an 
equivalence relation. 


23.8 Show that Cg(x) is a subgroup of G. Let H be a subgroup of G and 
suppose x € H. Show that Cy (x) is a subgroup of Cg(x). 


23.9 (a) Show that the only element a in a group with the property a* =a 
is the identity. (b) Now use eg * eg = eg to show that any homomorphism 
maps identity to identity. (c) Show that if f : G — H is a homomorphism, 
then f(g~') =[f(g)]'. 


23.10 Establish a bijection between the set of right cosets and the set of left 
cosets of a subgroup. Hint: Define a map that takes St to t7!S. 


23.11 Let G be a finite group and S one of its subgroups. Convince yourself Lagrange’s theorem 


that the union of all right cosets of S is G. Now use the fact that distinct right 
cosets are disjoint and that they have the same cardinality to prove that the 
order of S divides the order of G. In fact, |G| = |S||G/S|, where |G/S| is the 
number of cosets of S (also called the index of S in G). This is Lagrange’s 
theorem. 


23.12 Let f : G — H be a homomorphism. Show that ¢: G/ker f > 
FS (G) given by $(g[ker f]) = ¢([ker f]g) = f(g) is an isomorphism. 


23.13 Let G’ denote the commutator subgroup of a group G. Show that G’ 
is anormal subgroup of G and that G/G’ is abelian. 


23.14 Let M =R U {oo}, and define an action of SL(2, R) on M by 
a b ax +c 
‘x= : 
c d bx +d 


(a) Show that this is indeed a group action with a law of multiplication 
identical to the matrix multiplication. 

(b) Show that the action is transitive. 

(c) Show that beside identity, there is precisely one other element g of the 
group such that g-x =x forallx eM. 
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(d) Show that for every x € M, 


a b 
Cee (42 +(d—a)x D 


23.15 Show that two conjugacy classes are either disjoint or identical. 


23.16 Show that if all conjugacy classes of a group have only one element, 
the group must be abelian. 


23.17 Consider a map from the conjugacy class of G containing x € G 
to the set of (left) cosets G/Cg(x) given by (axa~') = aCg(x). Show 
that @ is a bijection. In particular, show that |CG(x)| = |G|/|K - | where 
KG is the class in G containing x and |K°| its order (see Problem 23.11). 
Use this result and Problems 23.8 and 23.11 to show that |H|/ |K? | divides 
IGI/|KEI. 


23.18 Show that RRg(9)R~! corresponds to a rotation of angle @. Hint: 
Consider the effect of rotation on the vectors in the plane perpendicular to 


é, and note that the rotated plane is perpendicular to é’ = Ré. 


23.19 Let G act on M and let mo € M. Show that Gmg is the smallest 
invariant subset of M containing mo. 


23.20 Suppose G is the direct product of H; and Hz and g = hyh2. Show 
that the factors h; and hz are unique and that H; and A? are normal. 


23.21 Show that (g,/), (g’,h’) € G x H are conjugate if and only if g is 
conjugate to g’ and h is conjugate to h’. Therefore, conjugacy classes of the 
direct product are obtained by pairing one conjugacy class from each factor. 
23.22 Find the products zr, 0 72 and 2 0 7 of the two permutations 


seaft 2 Oa SC) 4s fl 2S 
Pls 4 6. 454. 2 B= \3. i 3 


23.23 Find the inverses of the permutations 


wu(i 2345678 
rae 6 oo TS ea Se)? 
mu({i 2345678 
AN 5 & @ 1 7-4. 3 


and show directly that (71 0 mm) = i, co) ie 


23.24 Find the inverse of each of the following permutations: 2); = 


(33.47) 72= (4353) 73 = (65.4320) amd ma = (55572): 
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23.25 Express each of the following products in terms of disjoint cycles. 
Assume that all permutations are in $7. 


(a) (123)(347)(456)(145). (b) (34)(562) (273). 
(c) (1345)(134) (13). 


23.26 Express the following permutations as products of disjoint cycles, 
and determine which are cyclic. 


5 6 i oo 4 
a) (®) Chee 


4 

5 

4 5 

4 2) 
12345678 


23.27 Express the permutation r= (G 4146875 
sitions. Is the permutation even or odd? 


2 3 
3 4 
2°33 
3 5 
) as a product of transpo- 


23.28 Express the following permutations as products of transpositions, and 
determine whether they are even or odd. 


lee aoc ae ae, De 2B Ae Be IG: 38 
Org hs): On eee) 
1 Oy 3 Ay 5% (Do Be 66054 
OM Ca eee aa Oe ee) 


23.29 Show that the product of two even or two odd permutations is always 
even, and the product of an even and an odd permutation is always odd. 


23.30 Show that 2 and ~! have the same parity (both even or both odd). 


23.31 Find the number of distinct conjugacy classes of Ss and S¢. 


Representation of Groups 24 


Group action is extremely important in quantum mechanics. Suppose the 
Hamiltonian of a quantum system is invariant under a symmetry transforma- 
tion of its independent parameters such as position, momentum, and time. 
This invariance will show up as certain properties of the solutions of the 
Schrédinger equation. 

Moreover, the very act of labeling quantum-mechanical states often in- 
volves groups and their actions. For example, labeling atomic states by 
eigenvalues of angular momentum assumes invariance of the Hamiltonian 
under the action of the rotation group (see Chap. 29) on the Hilbert space of 
the quantum-mechanical system under consideration. 


24.1 Definitions and Examples 


In the language of group theory, we have the following situation. Put all 
the parameters x1,...,x, of the Hamiltonian H together to form a space, 
say R?, and write H= H(x1,...,x)) = H(x). A group of symmetry of H 
is a group G whose action on R? leaves H unchanged,! i.e., H(x- g) = H(x). 


: : . : ; 2 22 
For example, a one-dimensional harmonic oscillator, with H = -- £, + 


smo x?, has, among other things, parity P (defined by Px = —x) asa 


symmetry. Thus, the group G = {e, P} is a group of symmetry of H. 

The Hamiltonian H of a quantum-mechanical system is an operator in a 
Hilbert space, such as £7(IR3), the space of square-integrable functions. The 
important question is: What is the proper way of transporting the action of 
G from R? to £7(R*)? This is a relevant question because the solutions of 
the Schrddinger equation are, in general, functions of the parameters of the 
Hamiltonian, and as such will be affected by the symmetry operation on the 
Hamiltonian. The answer is provided in the following definition.” 


'Tt will become clear shortly that the appropriate direction for the action is from the right. 


?We have already encountered the notion of representation in the context of algebras. 
Groups are much more widely used in physics than algebras, and group representations 
have a wider application in physics than their algebraic counterparts. Since some readers 
may have skipped the section on the representation of algebras, we’ II reintroduce the ideas 
here at the risk of being redundant. 
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24 Representation of Groups 


Definition 24.1.1 Let G be a group and H a Hilbert space. A representa- 
tion of G on 1 is a homomorphism T : G > GL(J). The representation 
is faithful if the homomorphism is 1-1. We often denote T(g) by Tg. H is 
called the carrier space of 7. The trivial homomorphism T : G > {1} is 
also called the identity representation. The dimension of H{ is called the 
dimension of the representation T . 


We do not want to distinguish between representations that differ only by 
isomorphic vector spaces, because otherwise we can generate an infinite set 
of representations that are trivially related to one another. A vector space 
isomorphism f : H — HH’ induces a group isomorphism ¢ : GL(H) > 
GL(K’) defined by 


o(T)=foTof ! forte GL(H). 
This motivates the following definition. 


Definition 24.1.2 Two representations T : G > GL(H) and T’: G > 
GL(H’) are called equivalent if there exists an isomorphism f : H > H’ 
such that T, = f oT, of! forall g eG. 


Box 24.1.3 Any representation T : G — GL() defines an action of 
the group G on the Hilbert space H by ®(g, |a)) =T.|a). 


As we saw in Chaps. 4 and 5, the transformation of an operator A under 
T, would have to be defined by T,A(T,) | . For a Hamiltonian with a group 
of symmetry G, this leads to the identity 


T,[H(x)](T,) | =H(x- g). 
Similarly, the action of the group on a vector (function) in L£?(R?) is defined 
by 
(Tew) (x) = W(x: g), (24.1) 


where the parentheses around T,y designate it as a new function. One can 
show that if G acts on the independent variables of a function on the right 
as in Eq. (24.1), then the vector space of such functions is the carrier space 
of a representation of G. In fact, 


(Te,2) W(x) = W(x: (9182)) = W((K: 81): 82) = Te W)&: 81) = G(X: 81), 


where we have defined the new function ¢ by the last equality. Now note 
that 


9(X- 81) = (Te, 9)(X) = (Tz, Te ¥)) (&) = (Te, Ty, W(X). 


It follows from the last two equations that 


TeioW = Tei TW. 


24.1 Definitions and Examples 


Since this holds for arbitrary y, we must have Tz, 9, = Tg, T,,, ie., that T 
is a representation. When the action of a group is “naturally” from the left, 
such as the action of a matrix on a column vector, we replace x- g with 
g! - x. The reader can check that T: G > GL(H), given by Tgw(x) = 
w(g—! -x), is indeed a representation. 


Example 24.1.4 Let the Hamiltonian of the time-independent Schrédinger 
equation H|w) = E|y) be invariant under the action of a group G. This 
means that 


T,HT,;'=H => [H,T,]=0, 


ie., that H and T, are simultaneously diagonalizable (Theorem 6.4.18). 
It follows that we can choose the energy eigenstates to be eigenstates of 
T, as well, and we can label the states not only by the energy “quantum 
numbers”—eigenvalues of H—but also by the eigenvalues of T,. For exam- 
ple, if the Hamiltonian is invariant under the action of parity P, then we can 
choose the states to be even, corresponding to parity eigenvalue of +1, or 
odd, corresponding to parity eigenvalue of —1. Similarly, if G is the rota- 
tion group, then the states can be labeled by the eigenvalues of the rotation 
operators, which are, as we shall see, equivalent to the angular momentum 
operators discussed in Chap. 13. 

In crystallography and solid-state physics, the Hamiltonian of an (infi- 
nite) lattice is invariant under translation by an integer multiple of each so- 
called primitive lattice translation, the three noncoplanar vectors that define 
a primitive cell of the crystal. The preceding argument shows that the energy 
eigenstates can be taken to be the eigenstates of the translation operator as 
well. 


It is common to choose a basis and represent all T,’s in terms of matrices. 
Then one gets a matrix representation of the group G. 


Example 24.1.5 Consider the action of the 2D rotation group SO(2) (rota- 
tion about the z-axis) on R?: 
x’ =xcosé — ysing, 
r=R,()r > y’=xsin6+ ycosd, 
Z =z. 
For a Hilbert space, also choose R?. Define the homomorphism T : G > 
GL(5) to be the identity map, so that T(R,(0)) = Tg = R-(@). The operator 
Ty transforms the standard basis vectors of H as 
Teé; = Te (1, 0, 0) = (cos, sind, 0) = cose; + sin dé + 063, 
Teo = Ta (0, 1, 0) = (— sin, cos @, 0) = —sinOe€; + cos 2 + 063, 
T,é3 = Te (0, 0, 1) = (0, 0, 1) = Oe; + 0e2 + €3. 
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24 Representation of Groups 
It follows that the matrix representation of SO(2) in the standard basis of F( 
is 
cosO —sind 0 


Te =] sind cosd O 
0 0 1 


Note that SO(2) is an infinite group; its cardinality is determined by the 
“number” of 6’s. 


Example 24.1.6 Let 53 act on R? on the right by shuffling components: 


(x1, X2,.X3) + = (Xq(1),Xn2)) X73), 1 ESS. 


For the carrier space, choose R? as well. Let T : S3 > GL(R*) be given as 


x] X7(1) 
follows: T (zr) is the matrix that takes the column vector x = (22 ) to (272) Js 

x3 X7(3) 
As a Specific illustration, consider 7 = (i ' 5») and write T,, for T (7). Then 


T,, (€1) = T, 1, 0, 0) = (1, 0, 0) a ae (0, 1,0) =€, 
Tx (€2) = Tx (0, 1,0) = (0, 1,0) - 7 = (0, 0, 1) =6s, 
Tx (€3) = Tz (0, 0, 1) = (0, 0, 1)-7 = (1, 0,0) =@1, 


which give rise to the matrix 


0 
T= |{1 
0 


- OO 


1 
0 
0 


The reader may construct the other five matrices of this representation and 
verify directly that it is indeed a (faithful) representation: Products and in- 
verses of permutations are mapped onto products and inverses of the corre- 
sponding matrices. 


24.2 Irreducible Representations 


The utility of a representation lies in our comfort with the structure of vec- 
tor spaces. The climax of such comfort is the spectral decomposition theo- 
rems of (normal) operators on vector spaces of finite (Chap. 6) and infinite 
(Chap. 17) dimensions. The operators T,, relevant to our present discussion, 
are, in general, neither normal nor simultaneously commuting. Therefore, 
the complete diagonalizability of all T,’s is out of the question (unless the 
group happens to be abelian). 

The best thing next to complete diagonalization is to see whether there 
are common invariant subspaces of the vector space { carrying the repre- 
sentation. We already know how to construct (minimal) “invariant” subsets 
of H: these are precisely the orbits of the action of the group G on 1. The 
linearity of T,’s guarantees that the span of each orbit is actually an invari- 
ant subspace, and that such subspaces are the smallest invariant subspaces 


24.2 Irreducible Representations 


containing a given vector. Our aim is to find those minimal invariant sub- 
spaces whose orthogonal complements are also invariant. We encountered 
the same situation in Chap. 6 for a single operator. 


Definition 24.2.1 A representation T : G + GL(5) is called reducible if 
there exist subspaces U and W of FH such that H = U @ W and both U and 
W are invariant under all T,’s. If no such subspaces exist, KH is said to be 
irreducible. 


In most cases of physical interest, where H is a Hilbert space, W = Ut. 
Then, in the language of Definition 6.1.4, a representation is reducible if a 
proper subspace of Jt reduces all T,’s. 


Example 24.2.2 Let S3 act on R? as in Example 24.1.6. For the carrier 
space J{, choose the space of functions on R3, and for 7, the homomor- 
phism T : G > GL(K), given by Tgw(x) = w(x: g), for wy eH. Any w 
that is symmetric in x, y, z, such as xyz, x + y+, or x74 y?4 2’, 
defines a one-dimensional invariant subspace of J(. To obtain another in- 
variant subspace, consider yj (x, y, Zz) = xy and let {xi}°_, be as given 
in Example 23.4.1. Then, denoting Tz, by T;, the reader may check 
that 


Myil@.y, a= 
Tyil@,y,. a= 


vi( m1) = Wi(x,y,z)=xy=Wi(%, y,2), 
w( 1 
[T3vil(@x, y,.2)=Wi((x, y, 2) 4 
aul 1. 
v( 1 


1 (x, y,Z)- 


1 (x, y,Z)- =V1(y,x,zZ) =yx = W(x, y, 2), 


1 
2 


3 
Tayi], y= 


Tsyi]@,y,2) = 


[Tevil(x, yz) = Wil, y, 2) 6) = Vi, Z, 4) = yz = Wr(x, y, 2). 


Wit, Z,y) =xz = W3(x, y, 2), 


(x, y,z)+ 74 


) 
) 
)=Wi, yx) =zy = Wr, y, 2), 
) 
) 


1 
1 (x, y,Z)- 5 =Wi(z, x, y) = zx = 3 (x, y, Z), 
This is clearly a three-dimensional invariant subspace of H with W1, v2, 
and w3 as a convenient basis, in which the first three permutations are rep- 
resented by 


1 0 0 1 0 0 0 1 0 
T=]0 1 Of, T2={0 0 1], T3=]1 0 0 
0 0 1 0 1 0 0 0 1 


It is instructive for the reader to verify these relations and to find the three 
remaining matrices. 


Example 24.2.3 Let 53 act on R? as in Example 24.1.6. For the carrier 
space of representation, choose the subspace V of the H{ of Example 24.2.2 
spanned by the six functions x, y, z, xy, xz, and yz. For T, choose the 
same homomorphism as in Example 24.2.2 restricted to V. It is clear that 
the subspaces U and W spanned, respectively, by the first three and the last 
three functions are invariant under $3, and that V = U @ W. It follows that 
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the representation is reducible. The matrix form of this representation is 
found to be of the general form ‘e al where B is one of the 6 matrices of 
Example 24.2.2. The matrix A, corresponding to the three functions x, y, 


and z, can be found similarly. 


Let H{ be a carrier space, finite- or infinite-dimensional. For any vector 
|a), the reader may check that the span of {T,|@)}geqG is an invariant sub- 
space of H. If G is finite, this subspace is clearly finite-dimensional. The 
irreducible subspace containing |a), a subspace of the span of {T,|a)}ecG, 
will also be finite-dimensional. Because of the arbitrariness of |a), it follows 
that every vector of 1 lies in an irreducible subspace, and that 


Box 24.2.4 All irreducible representations of a finite group are finite- 
dimensional. 


Due to the importance and convenience of unitary operators (for exam- 
ple, the fact that they leave the inner product invariant), it is desirable to be 
able to construct a unitary representation—or a representation that is equiv- 
alent to one—of groups. The following theorem ensures that this desire can 
be realized for finite groups. 


Theorem 24.2.5 Every representation of a finite group G is equiva- 
lent to some unitary representation. 


Proof We present the proof because of its simplicity and elegance. Let 
T be a representation of G. Consider the positive hermitian operator T = 
Deg TT: and note that 


THT, = ¥-[7(g)]" [TE] TOT (g) 
xeEG 
=)>-[Te@s)]'Tag)= > -[70)]'70) =T, (24.2) 
xeEG yeG 


where we have used the fact that the sum over x and y = xg sweep through 
the entire group. Now let S = JT, and multiply both sides of Eq. (24.2)— 
with $* replacing T—by $~! on the left and by 3 on the right to 
obtain 


s'tis=st,'s"! => (st,S"!)'=(st,S-!)"' vee. 


This shows that the representation 7’ defined by T, =ST aa forall g EG 
is unitary. 


There is another convenience afforded by unitary representations: 


24.2 Irreducible Representations 


Theorem 24.2.6 Let T : G— GL(H) be a unitary representation and W 
an invariant subspace of H. Then, W~ is also invariant. 


Proof Suppose |a) € W+. We need to show that Tyla) € W? for all g € G. 
To this end, let |b) € W. Then 


(b|Tg|a) = ((a|T}1b))” = ((alT, '1b))" = ((alT,-11b))" = 0, 


because T,-1|b) € W. It follows from this equality that T,|a) € W+ for all 
geG. 


The carrier space 1 of a unitary representation is either irreducible or has 
an invariant subspace W, in which case we have H = W @® WL, where, by 
Theorem 24.2.6, W+ is also invariant. If W and W~ are not irreducible, then 
they too can be written as direct sums of invariant subspaces. Continuing 
this process, we can decompose 4 into irreducible invariant subspaces W 
such that 


H=WY o9Ww® gw®%e@.... 


If the carrier space is finite-dimensional, which we assume from now on and 
for which we use the notation V, then the above direct sum is finite and we 
write 


P 
V=W) gw) @..-@ WP) = QB wh, (24.3) 
k=1 


One can think of W™ as the carrier space of an (irreducible) representa- 
tion. The homomorphism T™ : G > GL(W™) is simply the restriction of 
T to the subspace W™), and we write 


F 

=T) oT (r) = (k) 

T, =T) oT? ©. OTP =D. 
k=1 


If we identify all equivalent irreducible representations and collect them to- 
gether, we may rewrite the last equation as 


p 
Ty = mT.) @mT ©--- Om TYP =P, (24.4) 


a=1 


where p is the number of inequivalent irreducible representations and my 
are positive integers giving the number of times an irreducible representation 
7 and all its equivalents occur in a given representation. 

In terms of matrices, Tg will be represented in a block-diagonal form as 


Te’ 0 casa 0 
O° TF sce 0 


Tg = . é a ’ 


Oo -D sox TY 
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or 
1 
[Toles 0 0 
2 
O-.  - Tre ia 0 
T= 
0 O:. sa. he 


where, in the first matrix some of the i may be equivalent, and in the 
second matrix, woes is a block-diagonal matrix consisting of ma copies 
of the matrix Te, 


Example 24.2.7 A one-dimensional (and therefore irreducible) represen- 
tation, defined for all groups, is the trivial (symmetric) representation 
T :G— C given by T(g) = | for all g € G. For the permutation group 
Sn, one can define another one-dimensional (thus irreducible) representation 
T : Sy, > C, called the antisymmetric representation, given by T (7) = +1 
if z is even, and T(z) = —1 if z 1s odd. 


Given any (matrix) representation T of G, one can form the transpose 
inverse matrices qT, and complex conjugate matrices Ty The reader 
may check that each set of these matrices forms a representation of G. 


Definition 24.2.8 The set of matrices (ae and TS are called, respec- 
tively, the adjoint representation, denoted by 7, and the complex con- 
jugate representation, denoted by T%*. 


24.3 Orthogonality Properties 


Homomorphisms preserve group structures. By studying a group that is 
more attuned to concrete manipulations, we gain insight into the structure 
of groups that are homomorphic to it. The group of invertible operators on a 
vector space, especially in their matrix representation, are particularly suited 
for such a study because of our familiarity with matrices and operators. The 
last section reduced this study to inequivalent irreducible representations. 
This section is devoted to a detailed study of such representations. 


Lemma 24.3.1 (Schur’s lemma) Let T : G > GL(V) and TT’: G > 
GL(V’) be irreducible representations of G. If A € L(V, V’) is such that 


AT, = TLA Vg eG, (24.5) 
then either A is an isomorphism (i.e., T is equivalent to T'), or A= 0. 
Proof Let |a) € kerA. Then 


AT,|a) =T, Ala) =0 => Tzgla)ckerA VgeG. 


=0 


24.3 Orthogonality Properties 


It follows that ker A, a subspace of V, is invariant under T. Irreducibility of 

T implies that either ker A = V, or ker A = 0. The first case asserts that A is 

the zero linear transformation; the second case implies that A is injective. 
Similarly, let |b) ¢ ACV). Then |b) = A|x) for some |x) € V: 


T),|b) =T,Alx) =AT,|x) = T.|b)eA(V) Vg eG. 
. ——" 


eA(V) 


It follows that A(V), a subspace of V’, is invariant under T’. Irreducibility 
of T’ implies that either A(V) = 0, or A(V) = V’. The first case is consistent 
with the first conclusion drawn above: kerA = V. The second case asserts 
that A is surjective. Combining the two results, we conclude that A is either 
the zero operator or an isomorphism. 


Lemma 24.3.1 becomes extremely useful when we concentrate on a sin- 
gle irreducible representation, i.e., when T’ = T. 


Lemma 24.3.2 Let T : G— GL(\V) be an irreducible representation 
of G. If Ae £(V) is such that AT, =T,A for all g € G, then A=A1. 


Proof Replacing V’ with V in Lemma 24.3.1, we conclude that A = 0 or Ais 
an isomorphism of V. In the first case, A = 0. In the second case, A must have 
a nonzero eigenvalue A and at least one eigenvector (see Theorem 6.2.5). It 
follows that the operator A — 41 commutes with all T,’s and it is not an 
isomorphism (why not?). Therefore, it must be the zero operator. 


We can immediately put this lemma to good use. If G is abelian, all op- 
erators {T,},;<¢G commute with one another. Focusing on one of these opera- 
tors, say T,, noting that it commutes with all operators of the representation, 
and using Lemma 24.3.2, we conclude that T, = 41. It follows that when 
T, acts on a vector, it gives a multiple of that vector. Therefore, it leaves any 
one-dimensional subspace of the carrier space invariant. Since this is true 
for all g € G, we have the following result. 


Theorem 24.3.3 All irreducible representations of an abelian group are 
one-dimensional. 


This theorem is an immediate consequence of Schur’s lemma, and is in- 
dependent of the order of G. In particular, it holds for infinite groups, if 
Schur’s lemma holds for those groups. One important class of infinite groups 
for which Schur’s lemma holds is the Lie groups (to be discussed in Part IX). 
Thus, all abelian Lie groups have 1-dimensional irreducible representations. 
We shall see later that the converse of Theorem 24.3.3 is also true for finite 
groups. 
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Historical Notes 

Issai Schur (1875-1941) was one of the most brilliant mathematicians active in Germany 
during the first third of the twentieth century. He attended the Gymnasium in Libau (now 
Liepaja, Latvia) and then the University of Berlin, where he spent most of his scientific 
career from 1911 until 1916. When he returned to Berlin, he was an assistant professor 
at Bonn. He became full professor at Berlin in 1919. Schur was forced to retire by the 
Nazi authorities in 1935 but was able to emigrate to Palestine in 1939. He died there of a 
heart ailment several years later. Schur had been a member of the Prussian Academy of 
Sciences before the Nazi purges. He married and had a son and daughter. 

Schur’s principal field was the representation theory of groups, founded a little before 
1900 by his teacher Frobenius. Schur seems to have completed this field shortly be- 
fore World War I, but he returned to the subject after 1925, when it became important 
for physics. Further developed by his student Richard Brauer, it is in our time experi- 
encing an extraordinary growth through the opening of new questions. Schur’s disserta- 
tion (1901) became fundamental to the representation theory of the general linear group; 
in fact, English mathematicians have named certain of the functions appearing in the 
work “S-functions” in Schur’s honor. In 1905 Schur reestablished the theory of group 
characters—the keystone of representation theory. The most important tool involved is 
“Schur’s lemma.” Along with the representation of groups by integral linear substitutions, 
Schur was also the first to study representation by linear fractional substitutions, treating 
this more difficult problem almost completely in two works (1904, 1907). In 1906 Schur 
considered the fundamental problems that appear when an algebraic number field is taken 
as the domain; a number appearing in this connection is now called the Schur index. His 
works written after 1925 include a complete description of the rational and of the contin- 
uous representations of the general linear group; the foundations of this work were in his 
dissertation. 

A lively interchange with many colleagues led Schur to contribute important memoirs to 
other areas of mathematics. Some of these were published as collaborations with other 
authors, although publications with dual authorship were almost unheard of at that time. 
Here we simply indicate the areas: pure group theory, matrices, algebraic equations, num- 
ber theory, divergent series, integral equations, and function theory. 


Example 24.3.4 Suppose that the Hamiltonian H of a quantum mechanical 
system with Hilbert space + has a group of symmetry with a representation 
T :G— GL(K). Then HT, = TgH for all g € G. It follows that H= A1 
if the representation is irreducible. Therefore, all vectors of each invariant 
irreducible subspace are eigenstates of the hamiltonian corresponding to the 
same eigenvalue, i.e., they all have the same energy. Therefore, the degen- 
eracy of that energy state is at least as large as the dimension of the carrier 
space. 

It is helpful to arrive at the statement above from a different perspective. 
Consider a vector |x) in the eigenspace M; corresponding to the energy 
eigenvalue £;. Since Tg and H commute, T,|x) is also in M;. Therefore, an 
eigenspace of a Hamiltonian with a group of symmetry is invariant under 
all T, for any representation T of that group. If T is one of the irreducible 
representations of G, say T™ with dimension ng, then dim M; > ng. 


Consider two irreducible representations T@ and T) of a group G 


with carrier spaces W and W'), respectively. Let X be any operator in 
L(W® , W)), and define 


A= T@xT®, => Tr xr (a7). 
xEG xEG 


24.3 Orthogonality Properties 
Then, we have 


TOA = x T (g)T™ (x)xTP (x—')T® (g'\T™ (g) 


xeG 
= >. To (gx)XT) ((gx)7!) T®)(g) = AT’. 
xEG 


=A because this sum also covers all G 


We are interested in the two cases where T™ = TP), and where T™ is 
not equivalent to T°). In the first case, Lemma 24.3.2 gives A = A1; in the 
second case, Lemma 24.3.1 gives A= 0. Combining these two results and 
labeling the constant multiplying the unit operator by X, we can write 


Y> 7 (g)XT®(g-!) =Ax bag. (24.6) 
gEG 


The presence of the completely arbitrary operator X indicates that Eq. (24.6) 
is a powerful statement about—and a severe restriction on—the operators 
Tg). This becomes more transparent if we select a basis, represent all 
operators by matrices, and for X, the matrix representation of X, choose a 
matrix whose only nonzero element is 1 and occurs at the /th row and mth 
column. Then Eq. (24.6) becomes 


2 TY wT? (87!) = Atm bap 5ij 
geG 


where Aj, is a constant that can be evaluated by setting 7 =i, a = 6, and 
summing over i. The RHS will give Ajm, >; Oij = AlmNa, Where ng is the 
dimension of the carrier space of J). For the LHS we get 


LSS) Ty Oi j= COE re), 


geG i geG 
= Do Tmt (¢-'8) = D0 Tm) = |G dm, 
geG eeG 


=(1)mi 


where |G| is the order of the group. Putting everything together, we obtain 


|G| 
i er eS 7, omSup i (24.7) 
geG 
If the representation is unitary, then 
(@) 4) 7B*(g) — IGI 
~~ Ti, Cg) Tin (g) = = om Sap ij: (24.8) 
Qa 


geG 


Equations (24.7) and (24.8) depend on the basis chosen in which to ex- 
press matrices. To eliminate this dependence, we first introduce the impor- 
tant concept of character. 


735 


736 


24 Representation of Groups 


character of a 
representation; simple 
character, compound 
character 


Definition 24.3.5 Let T : G — GL(V) be a representation of the group G. 
The character of this representation is the map x : G > C given by 


X(g) = tT, = >> Ti(g), 


where T(g) is the matrix representation of T, in any basis of V. If T is 
irreducible, the character is called simple; otherwise, it is called compound. 


The character of the identity element in any representation can be cal- 
culated immediately. Since a homomorphism maps identity onto identity, 
T, = 1. Therefore, 


x(e) =tr(1) = dimV. (24.9) 


Recall that two elements x, y € G belong to the same conjugacy class if 
there exist g € G such that x = gyg~!. This same relation holds for the 
operators representing the elements: Ty = T,TyT,-1. Taking the trace of 
both sides, and noting that T.-1 = Ti one obtains 


Box 24.3.6 All elements of a group belonging to the same conjugacy 
class have the same character. 


Setting i =/ and j =m in (24.7) and summing over i and j, we obtain 


yx e)x(e) 


gEG 


G 
= tes ae Coun 28s = |Gldep. (24.10) 


aes 
=Nq 


If the representation is unitary, then (24.8) gives 


Yo x (ax *(g) = 1G [Sag. (24.11) 
geG 


This equation suggests a useful interpretation: Characters can be thought 
of as vectors in a |G|-dimensional inner product space. According to 
Eq. (24.11), the characters of inequivalent irreducible representations are or- 
thogonal. In particular, since there cannot be more orthogonal vectors than 
the dimension of a vector space, we conclude that the number of irreducible 
inequivalent representations of a group cannot be more that the cardinality 
of that group. Actually, we can do better. Restricting ourselves to unitary 
representations and collecting all elements belonging to the same conjugacy 
class together, we write 


; 
Six xP = 1G ldap => (xP |x) =|Glbag, (24.12) 
i=1 


24.4 Analysis of Representations 


where i labels conjugacy classes, c; is the number of elements in the ith 
class, r is the number of classes in G, and |x) € C” is an r-dimensional 
vector with components {c/ : roa j=): Equation (24.12) shows that vectors 
belonging to different irreducible representations are orthogonal. Since there 
cannot be more orthogonal vectors than the dimension of a vector space, we 


conclude that 


Proposition 24.3.7 The number of inequivalent irreducible representations 
of a group cannot be more that the number of conjugacy classes of the group, 
Lé., 0 <r. 


The characters of the adjoint representation are obtained from 
K@=x(s"') > =x, 


where K; is the class consisting of all elements inverse to those of the class 
K;. The equations involving characters of inverses of group elements can be 
written in terms of the characters of the adjoint representation. For example, 
Eq. (24.10) becomes 


: 
Sox) (G) =1Gldup > Doeixf KH? =|Gldap. (24.13) 
geG i=! 


Other relations can be obtained similarly. 


24.4 Analysis of Representations 


We can use the results obtained in the last section to gain insight into a given 
representation. Take the trace of both sides of Eq. (24.4) and write the result 
as 


p 
x(g) = mix (g) +++ +m x (8) = Yo max (8); (24.14) 


a=1 


i.e., a compound character is a linear combination of simple characters with 
nonnegative integer coefficients. Furthermore, the orthogonality of simple 
characters gives 


1 
ma = — > x(g)x*(g), (24.15) 


(Gl ee 


yielding the number of times the irreducible representation T) occurs in 
the representation 7. 

Another useful relation is obtained if we multiply Eq. (24.14) by its com- 
plex conjugate and sum over g; the result is 
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YS 1x@ =o x@x*(@) = Yo max ee 


gcG gcG geG a 
en dx ex *(g) = an (24.16) 
gEG 
=|G|dap 


In particular, if T is irreducible, all my are zero except for one, which is 
unity. We therefore obtain the criterion for irreducibility: 


: 
> Ix(g) 7 = Yo cilxil? =|G|_ if T is irreducible. (24.17) 
geG i=l 


For groups of low order and representations of small dimensions, 
Eq. (24.16) becomes a powerful tool for testing the irreducibility of the 
representation. 


Example 24.4.1 Let G = S3 and consider the representation of Exam- 
ple 24.2.2. The characters of the first three elements of this representation 
are easily calculated: 


=trT; =3, x2 =trT.= 1, x3=trT3=1. 


Similarly, one can obtain x4 = 1, x5 = 0, and x6 = 0. Substituting this in 
Eq. (24.16) yields 


6 
Yd |x@)| =i lgPHPtP +P +P 4040? =12. 
geG j=l 


Comparing this with the RHS of (24.16) with |G| = 6 yields 7, m2 =2, 
This restricts the nonzero a’s to two, say a = 1 anda = 2. Moreover, m, and 
mz can be only 1. Thus, the representation of Example 24.2.2 is reducible, 
and there are precisely two inequivalent irreducible representations in it, 
each occurring once. 

We can actually find the invariant subspaces corresponding to the two ir- 
reducible representations revealed above. The first is easy to guess. Just tak- 
ing the sum of the three functions 1, w2, and w3 gives a one-dimensional 
invariant subspace; so, let d; = Ww, + Wo + W3, and note that the space W, 
spanned by ¢; is invariant. The second is harder to discover. However, if we 
assume that yf, w2, and 3 are orthonormal, then using the Gram—Schmidt 
process, we can find the other two functions orthogonal to ¢; (but not or- 
thogonal to each other!). These are 


g2=—W1 +22 —- vs, $3 = —W1 — Wn + 2y. 


The reader is urged to convince himself/herself that the subspace W 
spanned by 2 and #3 is the complement of W“ [i.e., V= W @ W?} 
and that it is invariant under all T,’s. 


24.4 Analysis of Representations 


A very useful representation can be constructed as follows. Let G = 
{g py , and recall that left multiplication of elements of G by a fixed ele- 
ment g; is a permutation of (g1, g2,..., 8m). Denote this permutation by 7;. 
Now define a representation R : G > GL(R”), called the regular repre- 
sentation, by 


Re; (41, X2, ---, Xm) = (Xn; (1) X77; (2) +++ Xa; (m))- 


That this is indeed a representation is left as a problem for the reader. One 
can obtain a matrix representation of R by choosing the standard basis 
{ej of R” and noting that Ry,é; = e, 1 (j): From such a matrix repre- 
sentation it follows that all characters x* of the regular representations are 
zero except for the identity, whose character is fas (e) = m [see Eq. (24.9)]. 
Now use Eq. (24.14) for g = e and for the regular representation to obtain 
m= ae 1 MaNq Where ng is the dimension of the w-th irreducible repre- 
sentation. We can find mag by using Eq. (24.15) and noting that only g =e 
contributes to the sum: 


_ 
“1G 


1 
m Yo P(e) *@) = — 780) XO) = na. 


gEG 


Ny 


In words, 


Box 24.4.2 The number of times an irreducible representation oc- 
curs in the regular representation is equal to the dimension of that 
irreducible representation. 


We therefore obtain the important relations 


p p 
xf =|G)8i1 =) nex, and |G|= >on, (24.18) 
a=l 


a=1 


where we have assumed that the first conjugacy class is that of the identity. 
For finite groups of small order, the second equation can be very useful in 
obtaining the dimensions of irreducible representations. 


Example 24.4.3 A group of order 2 or 3 has only one-dimensional inequiv- 
alent irreducible representations, because the only way that Eq. (24.18) can 
be satisfied for |G| = 2 or 3 is for all ng’s to be 1. A group of order 4 
can have either 4 one-dimensional or one 2-dimensional inequivalent irre- 
ducible representations. The symmetric group 53, being of order 6, can have 
6 one-dimensional, or 2 one-dimensional and one 2-dimensional inequiva- 
lent irreducible representations. We shall see later that if all inequivalent 
irreducible representations of a group are one-dimensional, then the group 
must be abelian. Thus, the first possibility for $3 must be excluded. 
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24.5 Group Algebra 


Think of group elements as (linearly independent) vectors. In fact, given 
any set, one can generate a vector space by taking linear combinations of 
the elements of the set assumed to form a basis. In the case of groups one 
gets a bonus: The product already defined on the basis (group elements) can 
be extended by linearity to all elements of the vector space to turn it into an 
algebra called the group algebra. For G = {g; fol? a typical element of the 
group algebra is a= )-"”"_, ajg;. One can add two vectors as usual. But the 
product of two vectors is also defined: 


m m 


ab — (Ss: (x bai =>) lab; 


i=1 j=l 


m 
=) «8k. 
k=1 


8i8j 
—— 
8k 


where cx is a sum involving a; and b;. The best way to learn this is to see 
an example. 


Example 24.5.1 Let G = $3 and consider a = 27, — 373 + 25 and b= 
2 — 274 + 376. Then, using Table 23.1, we obtain 
ab = (27 — 373 + 15)(12 — 274 + 376) 
= 27112 — 4114 + 677 M6 — 37372 + 673714 
— 97376 + 1502 — 27574 + 375716 
= 272 — 474 + 676 — 36 + O05 — 972 + 14 — 2734+ 371 


= 3 — 72 — 273 — 304 + 675 + 376. 


24.5.1 Group Algebra and Representations 


Group algebra is very useful for the construction and analysis of representa- 
tions of groups. In fact, we have already used a similar approach in the con- 
struction of the regular representation. Instead of R” used before, use the 
m-dimensional vector space A, the group algebra. Then left-multiplication 
by a group element g can be identified with i. the operators of the regular 
representation, and the invariant subspaces of A become the left ideals of A, 
and we can write 


A=L£,8420---®4,. 


Moreover, since the identity element of the group is the identity element of 
the algebra as well, we have 


e=ejt+---+e, e? =e, ee; =0 fori Fj. (24.19) 


24.5 Group Algebra 
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It is clear that if a = aa, then a/a will be idempotent. So, we can es- essentially idempotent 


sentially ignore the constant a, which is why a is called essentially idem- elements 


potent. Now consider the element of the group algebra 


P= > x (24.20) 
xEG 
and note that gP = )°.-.g gx = P. It follows that 


P=) gy s=) ¥ gx= >) PSIG: 


geG xEeG geGxeEG geG 


So, P is essentially idempotent. Furthermore, the reader may verify that the 
ideal generated by P is one-dimensional. 

Let us now apply the notion of the group algebra to derive further rela- 
tions among characters. Denote the elements of the ith class K; of G by 
iy tes and construct the element of the group algebra «; = a re If 
in the product of two such quantities 


cq 


«cj=> >> re cae (24.21) 
1=1 m=1 
Nod = y 6G, is in a certain conjugacy class, then the rest of that class 
can be obtained by taking all conjugates of y, i.e., elements of G that can be 
written as 


1 @.Q) 


eye Sen ae 


= gx, 97" gxm ge. 
ee 
eK; ek Te 
It follows that if one member of a class appears in the double sum of 
Eq. (24.21), all members will appear there. The reader may check that if 
y occurs k times in the double sum, then all members of the class of y occur 


k times as well. Collecting all such members together, we can write 
‘i 
= Yall: (24.22) 
l=1 


where c;;; are positive integers. 
Now consider the ath irreducible representation, and add all operators 
corresponding to a given class: 


, 
TS yO ee TOTP Sy etl”, (24.23) 
sek; I=1 


where the second equation follows from the same sort of argument used 
above to establish Eq. (24.22). One can show that T commutes with all 
Lae Therefore, by Schur’s lemma, 1 — a 1, and the second equation 
in (24.23) becomes 
7 
ri ad => ena”. (24.24) 
i=1 
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Taking the characters of both sides of Tt = 4 and using the first equa- 
tion in (24.23), noting that all elements of a class have the same character, 
we get 


(@) 


CiX; 
x =n, > AM = on 
a 
Substituting this in Eq. (24.24), we obtain 
cic jxf x =Nq 3 cijici x”. (24.25) 


This is another equation that is useful for computing characters. Note that 


this equation connects the purely group properties (c;’s and c;;;’s) with the 


properties of the representation ( x” sand Ng). Summing Eq. (24.25) over 


a and using the first equation in (24.18), we get 


aes xi © aye Yonex!” = ciilG| 


l=1 
=|G|5;, by (24.18) 
because c; = | (there is only one element in the class of the identity). Prob- 
lem 24.12 shows that c;j1 = cj5;,; where Kj is the class consisting of in- 
verses of elements of K;. It then follows that 


p 
|G| 

So xx? = —8i;. (24.26) 

a=1 a 


For a unitary representation, x,, (2) =X i * so Eq. (24.26) becomes 


p 
IG| |G] 

x xt = 8, > (xslt) = — 84), (24.27) 
a=1 ef of 

where | x;) € C® is a o-dimensional vector with components { a Me ,- This 


equation can also be written in terms of group elements rather than velaeaes. 
(a) 


Since x, © = = x(x) for any x € K;, we have 
GL 
3 ay "OS rad 5(KE, KS), (24.28) 
a=1 og 


where Ke is the conjugacy class of G containing x, |K o | is the number of 
its elements, and 

f wKeSKS, 
5( K <5 K¢) = 7 y 

0 otherwise. 

Equation (24.27) shows that the r vectors ei are mutually orthogonal; 
therefore, r < o. Combining this with Proposition 24.3.7, we obtain the fol- 
lowing: 


24.6 Relationship of Characters to Those of a Subgroup 


Table 24.1 A typical character table 


1 Ky 2 Ky es ¢ Kj an & K, 
1 1 1 1 
rw) a x _ WO po 
2 2 y) 2 
72 2) x i mc) a 2 
( 
T@) od xo gid tiie ee 7 
TT) yo x? bis af? eh x 


Theorem 24.5.2 The number of inequivalent irreducible representa- 
tions of a finite group is equal to the number of conjugacy classes in 
the group. 


It is convenient to summarize our result in a square table with rows la- 
beled by the irreducible representation and columns labeled by the conju- 
gacy classes of G. Then on the ath row and ith column we list ae and 
we get Table 24.1, called the character table of G. Note that c;, the order 
of K;, is written as a left superscript. Character tables have the property that 
any two of their rows are orthogonal in the sense of Eq. (24.12), and any 
two of their columns are orthogonal in the sense of Eq. (24.27). 

If all inequivalent irreducible representations of a group G have dimen- 
sion one, then there will be |G| of them [by Eq. (24.18)]. Hence, there will be 
|G| conjugacy classes; i.e., each class consists of a single element. By Prob- 
lem 23.16, the group must be abelian. Combining this with Theorem 24.3.3, 
we have the following theorem. 


Theorem 24.5.3 A finite group is abelian if and only if all its inequivalent 
irreducible representations are one-dimensional. 


24.6 Relationship of Characters to Those of a Subgroup 


Let H be a subgroup of G. Denote by K 7. and K . the H-class contain- 
ing h € H and the G-class containing g, respectively. Let d; and c; be the 
number of elements in the jth H-class and ith G-class, respectively. Any 
representation of G defines a representation of H by restriction. An irre- 
ducible representation of G may be reducible as a representation of H. This 
is because although the subspace W) of the carrier space that is irreducible 
under G is the smallest such subspace containing a given vector, it is possi- 
ble to generate a smaller subspace by applying a subset of the operators T, 
corresponding to those g’s that belong to H. It follows that 


T®(h) = So mact (h), he dH, (24.29) 


o 
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where 71y,4 are nonnegative integers as in Eq. (24.14) and t© are irreducible 
representations of H. If x and &) denote the characters of irreducible 
representations of G and H, respectively, then the equivalent equation for 
the characters is 


xO (h) = Yo macé™ (h), he H. (24.30) 


Multiply both sides by &“*(h), sum over h € H, and take the complex 
conjugate at the end. Then by the orthogonality relation (24.11), applied to 
H, we obtain 


fig Yo xO MEH). (24.31) 


|| heH 


Now multiply both sides of Eq. (24.31) by xg), sum over a, and use 
Eq. (24.28) to obtain 


IGI 
> mee x (g) = 
a 


mee > 8(Ke, KEM. (24.32) 
& 


heH 


The sum on the right can be transformed into a sum over conjugacy classes 
of H. Then Eq. (24.32) becomes 


a) 1G Ko, 
Limaeki = TA aie; ae) eee (24.33) 
where the sum on the LHS is over irreducible representations of G, and on 
the RHS it is over those H-classes j that lie in the ith G-class. Note that the 
coefficients |G|d;/(|H|c;) are integers by Problem 23.17. 

Equations (24.32) and (24.33) are useful for obtaining characters of G 
when those of a subgroup H are known. The general procedure is to note 
that the RHS of these equations are completely determined by the structure 
of the group G and the characters of H. Varying i, the RHS of (24.33) 
determines the r components of a (compound) character |), which, by the 
LHS, can be written as a linear combination of characters of G: 


lv) = mle”), (24.34) 
a=1 


where we have suppressed the irrelevant subscript «. If we know some of 
the |x ))’s, we may be able to determine the rest by taking successive inner 
products to find the integers mg, and subtracting each irreducible factor of 
the sum from the LHS. We illustrate this procedure for S,, in the following 
example. 


Example 24.6.1 Let K, = (17) and K2 = (2) for Sz (see Sect. 23.4 for no- 
tation). Example 24.2.7 showed that we can construct two irreducible repre- 
sentations for any S,,, the symmetric and the antisymmetric representations. 
The reader may verify that these two representations are inequivalent. Since 


24.6 Relationship of Characters to Those of a Subgroup 


Table 24.2 Character table for Sz 


1K, 1K 
TY 1 1 
T?) 1 -1 


Table 24.3 Partially filled character table for S3 


'Ky 3K> 2K 
TY 1 1 1 
T? 1 —1 1 
TO ? ? ? 


the number of inequivalent irreducible representations is equal to the num- 
ber of classes in a group, we have all the information needed to construct 
the character table for Sy. Table 24.2 shows this character table. We want 
to use the S2 character table to construct the character table for $3. With 
our knowledge of the symmetric and the antisymmetric representations, we 
can partially fill in the S3 character table. Let K; = (13), Ko = (2, 1), and 

= (3) and note that cy = 1, cp = 3, and c3 = 2. Then we obtain Ta- 
ble 24.3. To complete the table, we start with « = 1, and write the RHS of 
Eq. (24.33) as 


1 1 
Wi = = a os tp = 8 ) 
j J 
because d; = | for the two classes of S52. The sum on the RHS is over S>- 


classes that are inside the ith S3-class. For i = 1, only the first S2-class 
contributes. Noting that a are the entries of Table 24.2, we get 


Similarly, 


3 
{n= 3 _ ~.1=1 and p=—-0=0. 
c2 C3 


The second equation follows from the fact that there are no classes of $2 
inside the third class of $3. Equation (24.34) now gives 


3 r 
Waa | 1 p= >- me|y™). 
0 a=1 


We can find the number of times |x“) occurs in this compound character 
by taking the inner product: 


(x |p) = Yom (x ]x@) = mi |G| = 6m. 
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Table 24.4 Complete character table for 3 


1K, 3K> 2K3 
TY 1 1 1 
T? 1 —1 1 
T® 2 0 -1 


But 


’ 
(x fw) = Sorex? =1-1-343-1-142-1-0=6. 
i=l 


These two equations show that m, = 1. So, 


1 
1j=])]1 +m] x) + m3|x). 
1 


Subtracting the column vectors, we get a new character: 
lyv'J=| 0 |= m2|x) +m3|x). 


Taking the inner product with |x) yields mz = 0. It follows that |y’) is a 
simple character. In fact, 


Yd veily{? =1-27+3-0? +2. (-1)? =6, 
i 
and the criterion of irreducibility, Eq. (24.17), is satisfied. 


We can now finish up Table 24.3 to obtain Table 24.4, which is the com- 
plete character table for S3. 


24.7‘ Irreducible Basis Functions 


We have studied the operators T, and their characters representing group el- 
ements in rather extensive detail. Let us now turn our attention to the carrier 
space itself. In particular, we want to concentrate on the basis functions of 
the irreducible representations. We choose “functions,” rather than vectors, 
because of their use in quantum mechanics as discussed at the beginning of 
this chapter. 

Let {oy be a set of basis functions for W, the ath invariant 
irreducible sihspate. Invariance of W implies that 


T, |v) = nee 


where a (g) are elements of the matrix a representing g € G. 


24.7 Irreducible Basis Functions 


Definition 24.7.1 A function (or vector) lo) is said to belong to the 
ith row of the ath irreducible representation (or to transform accord- 
ing to the ith row of the ath irreducible representation) if there exists a 
basis A aay oe of the ath irreducible representation of G with matrices 
(7; (g)) and ng — | other functions (1oe)} such that 
Na 
T.|”)= >> T(g)|6). (24.35) 


j=l 


Functions that belong to rows of irreducible representations have some 
remarkable properties. Let Ww) and |e? ) transform according to the ith 
and jth rows of the ath and fth irreducible representations, respectively. 
Choose an inner product for the carrier space such that all representations 
are unitary. Then we have 


(Wi? |e”) =(T, Wf IT.#;"”) 


Ny "B 
=e TT (vy oP). 


l=1 m=1 
Summing this equation over g yields |G| (Wi jo? dy for the LHS, while 


(| G|/Na) dap Sim bij 


Na NB - 


_ 
RHS = 07 OT" TP (Wi lo?) 


l=1 m=1gEG 


IG| a 
= —5ap5iz YW” oy”), 
l=1 


Na 


where we have made use of Eq. (24.8). Therefore, 


Ny 


a 1 a a 
6) = iy DWM). 2426 


~ n 
t=1 


This shows that functions belonging to different irreducible representa- 
tions are orthogonal. We should expect this, because in our construction 
of invariant irreducible subspaces, we kept dividing the whole space into 
orthogonal complements. What is surprising is that functions transforming 
according to different rows of an irreducible representation are orthogonal. 
We had no control over this property! It is a consequence of Eq. (24.35). 
Another surprise is the independence of the inner product from i: If we let 
i= j anda = B on both sides of (24.36), we obtain 


No 


wy |, (24.37) 


 j=1 


a (o4 1 
Wile?) = 


1 


which indicates that the inner product on the LHS is independent of i. 
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Example 24.7.2 The quantum-mechanical perturbation theory starts with a 
known Hamiltonian Ho with eigenvalues E; and the corresponding eigen- 
states |E;). Subsequently, a (small) perturbing “potential” V is added to 
the Hamiltonian, and the eigenvalues and eigenstates of the new Hamilto- 
nian H = Ho + V are sought. One can draw important conclusions about 
the eigenvalues and eigenstates of the total Hamiltonian by symmetry argu- 
ments. 

Suppose the symmetry group of Ho is G, and that of H is H, which has 
to be a subgroup of G. In most cases, the eigenspaces of Ho are irreducible 
catrier spaces of G, i.e., their basis vectors transform according to the rows 
of irreducible representations of G. If H is a proper subgroup of G, then 
the eigenspaces of Ho will split according to Eq. (24.29). We say that some 
of the degeneracy is lifted because of the perturbation V. The nature of the 
split, i.e., the number and the dimensionality of the vector spaces into which 
a given eigenspace splits, can be obtained by the characters of G and H and 
Eq. (24.30). The original eigenspaces are represented on an energy diagram 
with a line corresponding to each eigenspace. The split of the eigenspace 
into k new subspaces is then indicated by the branching of the old line into 
k new lines. 

To the lowest approximation—first-order perturbation theory—the mag- 
nitude of the split, ie., the difference between the eigenvalues of Ho and 
those of H, is given by [see Eq. (21.57)] the expectation value (o vier), 
where 1) belongs to the ith row of the ath irreducible representation, and 
|e) to its jth row (i ¥ j). Only if this expectation value is nonzero will a 
split occur. This, in turn, depends on the symmetry of V: If V is at least as 
symmetric as Ho (corresponding to G = #2), then (oO Vie) = 0, and no 
splitting occurs (Problem 24.17). If, on the other hand, V is less symmetric 
than Ho (corresponding to H C G), then Vig) will not belong to the jth 


row of the ath irreducible representation, and in general, (o” vig ) 40. 


We have decomposed the carrier space V of a representation into invari- 
ant irreducible subspaces W). The argument above shows that each W) 
has a basis consisting of the “rows” of the irreducible representations. Cor- 
responding to such a basis, there is a set of projection operators po with 


the property aes j po? = 1 (Chap. 6). Our aim is to find an expression for 


these operators, which have the defining property pie) ye) = Wi). We 
start with Eq. (24.35), multiply both sides of it by 7)?” 


m (&), Sum over g € G, 
and use Eq. (24.8) to obtain 


Na 
Do Tim TEI) = DO) LI Tim" @)T (8) 
geG j=l geG 


Ny 


G G 
= el YoY)? 8178 mi dap = ied ei \Snidap- 
* j=l 


Na 
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Let 6 =a,m =/ =i, and multiply both sides by n, /|G|. Then this equation 
becomes 


— > Tye), )= wi), 
gEG 
which suggests the identification 


po — ra = ST" (g)T, (24.38) 


with the properties 
P= |W )birdop, PI 6) = |;), Oe) 


where Io) is the projection of |¢) along the ith row of the ath irreducible 
representation. 

We are also interested in the projection operator that projects onto the 
irreducible subspace W. Such an operator is obtained by summing pi 
over i. We thus obtain 


PTE TOT = OAM, a0 


| ee 1 geG 
— 
=xM*(g) 
and 
PO ly) = |p )}bag, Pid) =|), (24.41) 


where |@) is the projection of |%) onto the ath irreducible invariant sub- 
space. These formulas are extremely useful in identifying the irreducible 
subspaces of a given carrier space: 


Box 24.7.3 Start with a basis {|a;)} of the carrier space, apply P™ 
of Eq. (24.40) to all basis vectors, and collect all the linearly inde- 
pendent vectors of the form P |a;). These vectors form a basis of the 
ath irreducible representation. 


The following example illustrates this point. 


Example 24.7.4 Consider the representation of S3 given in Example 24.2.2, 
where the carrier space is the span of the three functions |y) = xy, 


Ivo) = yz, and |w3) = xz. 
We refer to the character table for $3 (Table 24.4) and use Eq. (24.40) to 
obtain 


1 
pe ZU +12 +T3 +74 +Ts +1), 


1 
pe gti —T2 -T3 —T4 + Ts +s), 
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2 
PO = - QT) —Ts —T), 


where, as in Example 24.2.2, we have used the notation T; for T,,, and the 
result ny =n2 = 1 and n3 = 2 obtained from Eq. (24.18), Theorem 24.5.3, 
and the fact that $3 is nonabelian. 

To get the first irreducible subspace of this representation, we apply P“ 
to |y1). Since this subspace is one-dimensional, the procedure will give a 
basis for it if the vector so obtained is nonzero: 


1 
PO) |y1) = rae +T2+T3+74+Ts + Te)lW) 
1 
= Alay + |W) + lo) + 13) + 13) + lv2)) 


1 
= 3 (V1) + v2) + I¥s)). 


This is a basis for the carrier space of the irreducible identity representation. 
For the second irreducible representation, we get 


Mia, 2 = 
P Ie) = E (Iv) lv) — lo) — |W3) + wa) + l2)) =0. 


Similarly, P™ |y2) = 0 and P® |w3) = 0. This means that 7) is not in- 
cluded in the representation we are working with. We should have ex- 
pected this, because if this one-dimensional irreducible representation were 
included, it would force the last irreducible representation to be one- 
dimensional as well [see Eq. (24.18)], and, by Theorem 24.5.3, the group 
S3 to be abelian! 

The last irreducible representation is obtained similarly. We have 


GB) ae! =! 
Ply) = rica — Ts — T6)|¥1) = 3 (2lvn) — |v) — |¥2)), 


@) 1 1 
PM 'o2) = 3@T1 — Ts — To) v2) = 3 (2lv2) —|¥1) — |¥s)). 


These two vectors are linearly independent. Therefore, they form a basis for 
the last irreducible representation. The reader may check that P®|y3) is a 
linear combination of P®)|y,) and P® | yr). 


24.8 Tensor Product of Representations 


A simple quantum mechanical system possessing a group of symmetry is 
described by vectors that transform irreducibly (or according to a row of 
an irreducible representation). For example, a rotationally invariant system 
can be described by an eigenstate of angular momentum, the generator of 
rotation.’ These eigenstates transform as rows of irreducible representations 


3Chapter 29 will make explicit the connection between groups and their generators. 


24.8 Tensor Product of Representations 


of the rotation group. At a more fundamental level, the very concept of a 
particle or field is thought of as states that transform irreducibly under the 
fundamental group of spacetime, the Poincaré group. 

Often these irreducible states are “combined” to form new states. For 
example, the state of two (noninteracting) particles is described by a two- 
particle state, labeled by the combined eigenvalues of the two sets of oper- 
ators that describe each particle separately. In the case of angular momen- 
tum, the single-particle states may be labeled as |/;,m;) for i = 1,2. Then 
the combined state will be labeled as |/;, 1; 1/2, mz), and one can define an 
action of the rotation group on the vector space spanned by these combined 
states to construct a representation. We now describe the way in which this 
is done. 

Let T : G > GL(V) and S : G > GL(W) be two representations of a 
group G. Define an action of the group G on V @ W, the tensor product of 
V and W, via the representation T @ S: G— GL(V @ W) given by 


(T @ S)(g)(Iv), lw)) = (T(g)lv), S(g)|w)). 
We note that 
(T ® S)(g1g2)(|v), |w)) 
= (T(g1g2)lv), S(gig2)|w)) = (T(g1)T (g2)|v), S(g1)S(g2)|w)) 
= (T ®S)(g1)(T(g2)|v), S(g2)|w)) 
=[(T @ S)(gi)(T @ S)(g2)](\v), |w)). 


It follows that T @ S is indeed a representation, called the tensor product or 
direct product or Kronecker product representation. It is common, espe- 
cially in the physics literature, to write |v, w), or simply |vw) for (|v), |w)), 
and TS for T ® S. If we choose the orthonormal bases {|v;)} for V and 
{|wa)} for W, and define an inner product on V @ W by 


(v, wiv’, w') = (vlv’)(wlw’), 


we obtain a matrix representation of the group with matrix elements given 
by 


(T @ S)ia, jo(8) = (vi, WalT @ S(g)|v;, wp) 
= (u;|T (g)|vj) (Wal S(g)| wp) = Tij (g)Sav(g)- 


Note that the rows and columns of this matrix are distinguished by double 
indices. If the matrix T is m x m and S is n x n, then the matrix T ® S is 
(mn) x (mn). The character of the tensor product representation is 


x78(g) = (7 @ S)iaialg) =) Tii(8)Saa(g) = >> Tii(g) Y > Saag) 


=x x5) => xf =x x5. (24.42) 


So the character of the tensor product is the product of the individual char- 
acters. 
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An important special case is the tensor product of a representation with 
itself. For such a representation, the matrix elements satisfy the symmetry 
relation 


(T @ Tia, jo(g) = (T @ T)ai,nj(g)- 
This symmetry can be used to decompose the tensor product space into two 
subspaces that are separately invariant under the action of the group. To 
do this, take the span of all the symmetric vectors of the form (|vjw;) + 
|vjw;)) € V@ V and denote it by (V @ V);. Similarly, take the span of all 
the antisymmetric vectors of the form (|v; wj) —|vjw;)) € V@ V and denote 
it by (V @ V)q. Next note that 
1 1 
|yjwj) = 5 (leiwy) + |vjwi)) + 5 (luiwy) — |vjwi)). 


It follows that every vector of the product space can be written as the sum of 
a symmetric and an antisymmetric vector. Furthermore, the only vector that 
is both symmetric and antisymmetric is the zero vector. Therefore, 


VOV=(VOV)s B(V®V)a. 


Now consider the action of the group on each of these subspaces sepa- 
rately. From the relation 


(T ®T)(g)|vj;wj) = (T ®T)(g)(Ivi), |w;)) 
=(StswIm). HWW) 
k I 


= S) MiTij(8)(g) (lve), |wr)) 
k,l 


= we @ T )xt,ij (g)|vewr) 


kl 


we obtain 


(T ® T)(g)(lviw,) + |vjwi)) 


= Sr @T )xiij(g) £ (T ® T) xi, ji(g) vg). (24.43) 
kl 


Problem 24.21 shows that the RHS can be written as a sum over the symmet- 
ric (for the plus sign) or antisymmetric (for the minus sign) vectors alone. It 
follows that 


Box 24.8.1 The Kronecker product of a representation with itself is 
always reducible into two representations, the symmetrized product 
and the antisymmetrized product representations. 


24.8 Tensor Product of Representations 


24.8.1 Clebsch-Gordan Decomposition 


A common situation in quantum mechanics is to combine two simple sys- 
tems into a composite system and see which properties of the original sim- 
ple systems the composite system retains. For example, combining the an- 
gular momenta of two particles gives a new total angular momentum op- 
erator. The question of what single-particle angular momentum states are 
included in the states of the total angular momentum operator is the con- 
tent of selection rules and is of great physical interest: A quark and an 
antiquark (two fermions) with spin always combine to form a meson (a 
boson), because the resulting composite state has no projection onto the 
subspace spanned by half-integer-spin particles. In this section, we study 
the mathematical foundation of this situation. The tensor product of two ir- 
reducible representations T@) and T?) of G is denoted by T@™*?), and it is, 
in general, a reducible representation. The characters, generally compound, 
are denoted by x%**), Equation (24.14), combined with Eq. (24.42), tells 
us what irreducible representations are present in the tensor product, and 
therefore onto which irreducible representations the product representation 
has nonzero projection: 


‘ 
( ) ) (B) ( 
7, = x/* x? Se 

o=1 


a . . . . . . 
where moe are nonnegative integers. We rewrite this more conveniently in 


terms of vectors as 


r 


[x P)) = Sm] x), 


o=1 


1 Lo. 
moe = eee _ i dc tae Cae Cola (24.44) 
1 


A group for which mi? = 0, | is called simply reducible. 


Historical Notes 

Rudolph Friedrich Alfred Clebsch (1833-1872) studied mathematics in the shadow 
of Jacobi at the University of K6nigsberg, two of his teachers having been students of 
Jacobi. After graduation he held a number of positions in Germany, including positions 
at the universities of Berlin, Giessen, and finally Gottingen, where he remained until his 
death. He and Carl Neumann, the son of one of the aforementioned teachers who were 
students of Jacobi, founded the Mathematische Annalen. 

Clebsch began his career in mathematical physics, producing a doctoral thesis on hydro- 
dynamics and a book on elasticity in which he treated the elastic vibrations of rods and 
plates. These works were primarily mathematical, however, and he soon turned his atten- 
tion more to pure mathematics. His links to Jacobi gave rise to his first work in that vein, 
concerning problems in variational calculus and partial differential equations, in which 
he surpassed the results of Jacobi’s work. 

Clebsch first achieved significant recognition for his work in projective invariants and 
algebraic geometry. He was intrigued by the interplay between algebra and geometry, 
and, since many results in the theory of invariants have geometric interpretations, the two 
fields seemed natural choices. 
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Example 24.8.2 Referring to Table 24.5 of Problem 24.15, and using 
Eq. (24.42), we can construct the compound character |x 4*>)) with com- 
ponents 9, —1, 1, 0, —1. Then, we have 


9 
—l 5 1 

xO) = ; = Dame |x) Mg =k |x 
= 


For the first irreducible representation, we get 


5 
1 1 
45 ty ay axsyy_ (Ie, (4x5) 
my =n |x = 9g LK Xi 
i= 


1 


For the second irreducible representation, we get 


1 
mis = Sal i 


= [1-1-9 46-(-1)-(-$3-1-148-C1)-0 


+6-(-1)-(-D]=1. 


Similarly, m; = 1, my = 1, and ms = |. We thus see that the identity 
representation is not included in the direct product of irreducible represen- 


tations 4 and 5; all other irreducible representations of S4 occur once in 
T (4x5), 


In terms of representations themselves, we have the so-called Clebsch- 
Gordan series 


r r 
POM B= mETOwy, mr o Lo 
o=l i=1 
(24.45) 
where we have used Eq. (24.13) 

The one-dimensional identity representation plays a special role in the 
application of group theory to physics because any vector (function) in its 
carrier space is invariant under the action of the group, and invariant vectors 
often describe special states of the quantum mechanical systems. For exam- 
ple, the ground state of an atomic system with rotational invariance has zero 
orbital angular momentum, corresponding to a spherically symmetric state. 

Another example comes from particle physics. Quarks are usually placed 
in the states of an irreducible representation of a group [SU(n), where n is 
the number of “flavors” such as up, down, charm], and antiquarks in its 
adjoint. A question of great importance is what combination of quarks and 
antiquarks leads to particles—called singlets—that are an invariant of the 
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group. For the case of quark-antiquark combination, the answer comes in 
the analysis of the tensor product of one irreducible representation, say T), 
and one adjoint representation, say T°). In fact, using Eq. (24.45), we have 


i << ..4 _ 1Y 7 
mo? = ie :S cx OZ = re > cx? = dap, 
i=l i=l 


where we used Eq. (24.13) and the fact that all characters of the identity 
representation are unity. Thus 


Box 24.8.3 To construct an invariant state, we need to combine a 
representation with its adjoint, in which case we obtain the identity 
representation only once. 


Historical Notes 

Paul Albert Gordan (1837-1912), the son of David Gordan, a merchant, attended gym- 
nasium and business school, then worked for several years in banks. His early interest in 
mathematics was encouraged by the private tutoring he received from a professor at the 
Friedrich Wilhelm Gymnasium. He attended Ernst Kummer’s lectures in number theory 
at the University of Berlin in 1855, then studied at the universities of Breslau, KGnigs- 
berg, and Berlin. At K6nigsberg he came under the influence of Karl Jacobi’s school, and 
at Berlin his interest in algebraic equations was aroused. His dissertation (1862), which 
concerned geodesics on spheroids, received a prize offered by the philosophy faculty of 
the University of Breslau. The techniques that Gordan employed in it were those of La- 
grange and Jacobi. 

Gordan’s interest in function theory led him to visit G.F.B. Riemann in Gottingen in 1862, Paul Albert Gordan 
but Riemann was ailing, and their association was brief. The following year, Gordan was 1837-1912 
invited to Giessen by Clebsch, thus beginning the fruitful collaboration most physicists 

recognize. Together they produced work on the theory of Abelian functions, based on Rie- 

mann’s fundamental paper on that topic, and several of Clebsch’s papers are considered 

important steps toward establishing for Riemann’s theories a firm foundation in terms 

of pure algebraic geometry. Of course, the Clebsch-Gordan collaboration also produced 

the famous coefficients that bear their names, so indispensable to the theory of angular 

momentum coupling found in almost every area of modern physics. 

In 1874 Gordan became a professor at Erlangen, where he remained until his retirement 

in 1910. He married Sophie Deuer, the daughter of a Giessen professor of Roman law, in 

1869. In 1868 Clebsch introduced Gordan to the theory of invariants, which originated in 

an observation of George Boole’s in 1841 and was further developed by Arthur Cayley 

in 1846. Following the work of these two Englishmen, a German branch of the theory 

was developed by S.H. Aronhold and Clebsch, the latter elaborating the former’s sym- 

bolic methods of characterizing algebraic forms and their invariants. Invariant theory was 

Gordan’s main interest for the rest of his mathematical career; he became known as the 

greatest expert in the field, developing many techniques for representing and generating 

forms and their invariants. 

Gordan made important contributions to algebra and solutions of algebraic equations, 

and gave simplified proofs of the transcendence of e and z. The overall style of Gordan’s 

mathematical work was algorithmic. He shied away from presenting his ideas in informal 

literary forms. He derived his results computationally, working directly toward the desired 

goal without offering explanations of the concepts that motivated his work. 

Gordan’s only doctoral student, Emmy Noether, was one of the first women to receive a 

doctorate in Germany. She carried on his work in invariant theory for a while, but under 

the stimulus of Hilbert’s school at Géttingen her interests shifted and she became one of 

the primary contributors to modern algebra. 
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So far, we have concentrated on the reduction of the operators and car- 
rier spaces into irreducible components. Let us now direct our attention to 
the vectors themselves. Given two irreducible representations T and T ?) 


with carrier spaces spanned by vectors (ey. and t1y\? "8 we form 


1? 
the direct product representation T@™*?) with ite carrier ie spanned by 
vectors {|4; y\”)}. We know that T(**?) is reducible, and Eq. (24.45) 
tells us how many times each irreducible factor occurs in T@*), This 
means that the span of (low? ) )} can be decomposed into invariant irre- 
ducible subspaces; i.e., there must exist a basis of the carrier of the product 
space the vectors of which belong to irreducible representations of G. More 
specifically, we should be able to form the linear combinations 


MP => CAB: 0, gif: DG vy”), (24.46) 
ij 


which transform according to the rows of the oth irreducible representation. 
Here the subscript k refers to the row of the oth representation, and q dis- 
tinguishes among functions that have the same o and k, corresponding to 
the case where mo? > 2. For simply reducible groups, the label g is unnec- 
essary. The coefficients C(@B; 0, g|ij; k) are called the Clebsch-Gordan 
coefficients for G. These coefficients are normalized such that 


Y= C* (OB: 6, qlij; KIC (0B; 0", gli: K) = 8o0'Sqq/Skx'. 
ij 

YS C*(@B: 0, qlij;s KC (eB; 0, gli’ js k) = 834/837". 

oqk 


This will guarantee that Pa are orthonormal if the product vectors 
form an orthonormal set. Using these relations, we can write the inverse 
of Eq. (24.46) as 


Jou) = 3° C*@B: 0, gif; Yo"). (24.47) 
oqk 


24.8.2 Irreducible Tensor Operators 


An operator A acting in the carrier space of the representation of a group 
G is transformed into another operator, A t> T,AT,! , by the action of the 
group. Just as in the case of vector spaces, one can thus construct a set of 
operators that transform among themselves by such action and lump these 
operators in irreducible sets. 


Definition 24.8.4 An operator Ay is said to transform according to the 
ith row of the ath irreducible representation if there exist ng — 1 other 
operators A” } and a basis {u)} such that 


24.8 Tensor Product of Representations 


No 
C= Oe (a) (5) yo) 
T,AY"T, = > 7); (g) AS”, (24.48) 


where ( uss ) (g)) is the matrix representation of g. The set of such operators 
is called an irreducible set of operators (or irreducible tensorial set). 


In particular, if shim (g) = 6;;, 1.€., if the representation is the identity rep- 
resentation, then A = T,AT;', and A is called a scalar operator. The term 
“scalar” refers to the fact that A has only one “component,” in contrast to 
the other operators of Eq. (24.48), which may possess several components. 

Consider the set of vectors (functions) defined by Wich dy = a Ie? yy 


where or" dy transform according to the #th irreducible representation. 
These vectors transform according to 


Ne 
i = TA, T, 'Te|6\”) = > 7 (g)A\” ns Ti (g)\6)”) 


= > jboss (97, (g)A\” oi?) = 2 ene we), 
kl kl 
(24.49) 


ie., according to the representation T‘“**). This means that the vectors 
iyi ) ) have the same transformation properties as the tensor product vec- 


tors ow dy. Therefore, using Eq. (24.47), we can write 


A |e?) = COB: 0, lis MO), 
oqgk 


and more importantly, 


(On [A |o)”) = > C*@B: 0, alii: KY (Gm Ve”) 
—— -——— 


age use Eq. (24.36) here 
=a ve qliism)(bm |r"). (24.50) 


It follows that the matrix element of the operator A” will vanish unless the 
irreducible representation JT”) occurs in the reduction of the tensor prod- 
uct T™ @ T®), and this can be decided from the character tables and the 
Clebsch-Gordan series, Eq. (24.45). 

There is another remarkable property of Eq. (24.50) that has significant 
physical consequences. Notice how the dependence on i and j is contained 
entirely in the Clebsch-Gordan coefficients. Moreover, Eq. (24.37) implies 
that (pW? jw”) 1) is independent of m. Therefore, this dependence must 
also be contained entirely in Clebsch-Gordan coefficients. One therefore 
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writes (24.50) as 


(dn [Ai |0)”)) = DO C*(OB: v. aliism) (6 |A|6),. (24.51) 
—<_.,_—_<__/ 
reduced matrix element 


This equation is known as the Wigner-Eckart theorem, and the numbers 
multiplying the Clebsch-Gordan coefficients are known as the reduced ma- 
trix elements. 

From the point of view of physics, Eq. (24.51) can be very useful in cal- 
culating matrix elements (expectation values and transition between states), 
once we know the transformation properties of the physical operator. For ex- 
ample, for a scalar operator S, which, by definition, transforms according 
to the identity representation, (24.51) becomes 


(mr |AlbF?) = (9 [A [6 )byp5mj3 


i.e., scalar operators have no matrix elements between different irreducible 
representations of a group, and within an irreducible representation, they are 
multiples of the identity matrix. This result is also a consequence of Schur’s 
lemma. 


24.9 Problems 


24.1 Show that the action of a group G on the space of functions y given 
by Tgy(x) = wig! - xX) is a representation of G. 


24.2 Complete Example 24.1.6. 


24.3 Let the vector space carrying the representation of S3 be the space of 
functions. Choose W(x, y, z) = xy and find the matrix representation of $3 
in the minimal invariant subspace containing yw. Hint: See Example 24.2.2. 


24.4 Let the vector space carrying the representation of S3 be the space of 
functions. Choose (a) w(x, y, z) =x and (b) Wi(x, y,z) = x”, and in each 
case, find the matrix representation of $3 in the minimal invariant subspace 
containing w1. 


24.5 Show that the representations T, T, and T* are either all reducible or 
all irreducible. 


24.6 Use the hermitian conjugate of Eq. (24.5) to show that $ = A‘A com- 
mutes with all T,’s. This result is used to prove Schur’s lemmas in infinite 
dimensions. 


24.7 Show that elements of a group belonging to the same conjugacy class 
have the same characters. 


24.8 Show that the regular representation is indeed a representation, i.e., 
that R : G > GL(m, R) is a homomorphism. 
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Table 24.5 Character table for S4 


Ky ©K> 3K; 8Ky Ks 
TO 1 1 1 1 1 
T? 1 —1 1 1 —1 
T® 2 0 2 -1 0 
T 3 1 -1 0 —1 
TS 3 -1 -1 0 1 


24.9 Prove Maschke’s Theorem: The group algebra is semi-simple. 


24.10 Let G be a finite group. Define the element P = )°.-g x of the group 
algebra and show that the left ideal generated by P is one-dimensional. 


24.11 Show that tT defined in Eq. (24.23) commutes with all operators 
7, Hint: Consider 1 ay ba cr )-1, 


24.12 Let K; denote the set of inverses of a conjugacy class K; with cj 
elements. 


(a) Show that Kj; is also a class with c; elements. 

(b) Show that identity occurs exactly c; times in the product «;«j, and 
none in the product «;«; if j #i’ [see Eq. (24.21)]. 

(c) Conclude that 


0 iff4i’, 


Cijl = + re 
q if j =i’. 


24.13 Show that the coefficients |G|d;/|H|c; of Eq. (24.33) are integers. 


24.14 Show that the symmetric and the antisymmetric representations of S,, 
are inequivalent. 


24.15 Construct the character table for S4 from that of $3 (given as Ta- 
ble 24.4), and verify that it is given by Table 24.5. 


24.16 Show that all functions transforming according to a given row of an 
irreducible representation have the same norm. 


24.17 Show that if the group of symmetry of V contains that of Ho and 
|e) belongs to the jth column of the ath irreducible representation, then 


so does Vidi). Conclude that (go? IVig\) =O fori F< j. 


24.18 Find the irreducible components of the representation of Exam- 
ple 24.1.6. 


24.19 Show that P®|yW3) of Example 24.7.4 is a linear combination of 
PO |y1) and P® |W). 
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24.20 Show that the tensor product of two unitary representations is unitary. 
24.21 Switch the dummy indices of the double sum in (24.43), add (sub- 
tract) the two sums, and use (T @ T)ia, jp(g) = (T ® T )ai,nj (g) to show that 
the double sum can be written as a sum over the symmetric (antisymmetric) 


vectors alone. 


24.22 Show that the characters x°(g) and x4(g) of the symmetrized and 
antisymmetrized product representations are given, respectively, by 


res sl(x¢)” +x(g*)] and x4(g)= s(x)’ — x(g’)]. 


24.23 Suppose that Ae transforms according to T, and AY’ ) according 
to T), Show that AY AY’ ) transforms according to T(@*?), 


24.24 Show that 


1 
sae DIOP IA AIP = Ko a}, PF 
ij 


One can interpret this as the statement that the square of the reduced matrix 
element is proportional to the average (over i and /) of the square of the full 
matrix elements. 


Representations of the Symmetric 2 5 


Group 


The symmetric (permutation) group is an important prototype of finite 
groups. In fact, Cayley’s theorem (see [Rotm 84, p. 46] for a proof) states 
that any finite group of order n is isomorphic to a subgroup of S,. Moreover, 
the representation of S, leads directly to the representation of many of the 
Lie groups encountered in physical applications. It is, therefore worthwhile 
to devote some time to the analysis of the representations of S,. 


25.1 Analytic Construction 


The starting point of the construction of representations of the symmetric 
group is Eq. (24.33), which is valid for any finite group. There is one simple 
character that every group has, namely, the character of the one-dimensional 
symmetric representation in which all elements of the group are mapped to 
1 ER. Setting £ = 1 in (24.33), and noting that >, dj = d;, we obtain 


|G|dj 
|H\ci’ 


vr = (25.1) 


where {wi } are the components of a compound character of G. 

Frobenius has shown that by a clever choice of H, one can completely 
solve the problem of the construction of the irreducible representations 
of S,. The interested reader may refer to [Hame 89, pp. 189-192] for de- 
tails. We are really interested in the simple characters of S,, and Frobenius 
came up with a powerful method of calculating them. Since there is a one-to- 
one correspondence between the irreducible representations and conjugacy 
classes, and another one between conjugacy classes of S,, and partitions of n, 
we shall label the simple characters of S,, by partitions of n. Thus, instead of 
our common notation oa , we use ra , where (A) denotes a partition of n, 
and (/) acycle structure of S). 

Suppose we want to find the irreducible characters corresponding to the 
cycle structure (/) = (1%, 28 3”, ...). These form a column under the class 
(J) in a character table. To calculate the irreducible characters, form two 
polynomials in (x1, %x2,...,Xn) as follows. The first one, which is com- 
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pletely symmetric in all variables, is 


n a n B n y 
w= (Ss) (>>) (2.7) oe (25.2) 
i=l i=l i=l 


The second one is completely antisymmetric, and can be written as 


Desis-14n) =] ]Or— 4p) =) enay Miho eae Oe ye (298) 


i<j 


It can be shown that the simple characters of S, are coefficients of certain 
terms of the product of these polynomials. To be exact, we have 


sy D(X1, ..-, Xn) 


(A) Aytn—-1_dAo+n-2 xh itl hn 
=) 0) dex Xray *m(2) 1 * ¥ax(n—-1) Xn)" (25.4) 
(A) 


The outer sum goes over all partitions of 1, the inner sum over all permu- 
tations of S,,. The procedure for finding the simple characters of 5, should 
now be clear from (25.4): 


Proposition 25.1.1 To find the simple character ee ‘4 con- 


struct the corresponding symmetric and antisymmetric polynomials 
of (25.2) and (25.3), multiply them together, collect all terms of the 
form 


Aytn—1_Ag+n—2 Apaatell le 
x a) Xn 


The coefficient of such a term is the desired character. 


Example 25.1.2 The best way to understand the procedure described above 
is to go through an example in detail. We calculate the characters of $3 using 
the above method. Label the rows of the character table with the partitions of 
3. These are (3), (2, 1), and (1, 1, 1). Similarly, label the columns with the 
conjugacy classes, or cycle structures: (13), (1, 2), and (3). The first cycle 
structure has a = 3, 6B =0= y. Therefore, 


S(3) = (x1 + x2 ae — oa + x3 + x3 


+ 3(xpx2 + x?x3 + mine + x3x3 + xix3 + 2X3) + 6X1X2X3 
(25.5) 


and 


D(x1,X2,.%3) = (x1 — X2)(X1 — x3) (x2 — x3) 


2 2 2 2 2 2 
= X{XQ — XP XZ — XAX + XZXZ — XZXQ+XZX1. (25.6) 


25.1 Analytic Construction 


Now we note that for (A) = (3), Ay = 3, Av = : and A3 = ie There- 
fore, the coefficient of ie : oe “2. hn = x5 7x2 gives Fis . Simi- 
larly, for (A) = (2, 1,0), A, = i. a. =, a 43 = 0, and the ee 
of pe ees xin = x 1x3 gives ti ) Finally, for (A) = (1, 1, 1), 
Ay =Az = A3 = 1, and the coefficient of ao : co aa cal nexE x 
gives Xa3 on ‘) These coefficients can be read off by scanning through 
Eq. (25.5) while multiplying its terms by those of Eq. (25.6) and keeping 
track of the coefficients of the products of the relevant powers of x1, x2, 
and x3. The reader may verify ee — is only one term of the form xem, 


whose coefficient is 1, giving x! ( o = |; there are two terms of the form 


neon whose coefficients are —1 and 3, giving x en = 2; and there are four 


terms of the form Rpxe a, whose coefficients are +1, —3, —3, and +6, 


giving i. Ya. Therefore, the first column of the character table of $3 


1 
is (2). 

1 

To obtain the second column, we consider the second conjugacy class, 
(1, 2), with a = 1 = B and y = 0. The corresponding symmetric polynomial 
is 


S(1,2) = (1 +x2 + x3)(x7 + a + i) 


= a + x3 + ioe + xtx2 + Cre + xx5 + x2x5 + Hike + 2X3. 
(25.7) 


D(x1, X2, x3) is the same as ees ee and keeping track of the 


coefficients of Be: a and x i 3; we obtain ree = 1, roan = 0, 


and x i ae = —1. It follows that the second column of the character table 


1 
of S3 is (0 ). 
-1 
The last column is obtained similarly. We note that a = 0 = 6, andy = 1. 
Therefore, the symmetric polynomial is 


5Q) =xj +23 +23, 


and the antisymmetric polynomial is the same as before. Multiplying these 


two polynomials and extracting the coefficients as before, we get x = 1, 


Xx a My —1, and <a ‘)) _ 1. Tt follows that the third column of the charac- 


1 
ter table of S3 is (-1). 
1 
Collecting all the data obtained above, we can reconstruct the character 
table of S3. This is shown in Table 25.1. The irreducible representations are 


labeled by the three possible partitions of 3, and the conjugacy classes by 
the three cycle structures. 
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Table 25.1 The character table for $3. Each column corresponds to a conjugacy class, 
each row to a partition of 3. The last two rows have been switched compared to Table 24.4 


(13) (1, 2) (3) 
T®) 1 1 1 
TD) 2 0 -1 
TOLD 1 -1 1 


25.2. Graphical Construction 


The analytic construction of the previous subsection can be handled using 
graphical techniques that are considerably simpler. To begin with, let us find 
the character of the identity element of S,,. The cycle structure is (1”), i.e., 
all cycles consist of a single element. Thus, a = n, and f, y, etc. are all 
zero. It follows that the LHS of Eq. (25.4) is (}) x;)"” D(x;). We calculate 
this product one power of }°x; at a time. For the same reason as in the 
example above, rete will be the coefficient of 


Aytn—1_d2+n-2 An-1+1 An 
xy X45 see Xy 4 Xn: 


Historical Notes 

Ferdinand Georg Frobenius (1849-1917), the son of a parson, was born in Berlin and 
began his mathematical studies at Gottingen in 1867. He received his doctorate in Berlin 
three years later. Four years later, on the basis of his mathematical research, he was ap- 
pointed assistant professor at the University of Berlin. He achieved the rank of full pro- 
fessor at the Eidgendssische Polytechnikum Ziirich before returning to Berlin as a profes- 
sor of mathematics in 1892. During the early years of Frobenius’s career, modern group 
theory was in its infancy. He combined its three main branches of study—the theory of 
solutions to algebraic equations (permutation groups and the work of Galois), geometry 
(transformation and Lie groups), and number theory—to produce the concept of the ab- 
stract group. He collaborated with Issai Schur in representation and character theory of 
groups. 

His paper Uber die Gruppencharactere is of fundamental importance. It was presented to 
the Berlin Academy on 16 July 1896 and it contains work that Frobenius had done in the 
preceding few months. In a series of letters to Dedekind, the first on 12 April 1896, his 
ideas on group characters quickly develop, and Frobenius is able to construct a complete 
set of representations by complex numbers. In a letter to Dedekind on 26 April 1896 
Frobenius finds the irreducible characters for the alternating group, and the symmetric 
groups. 

In 1897 Frobenius reformulated the work of Molien—the Latvian student of Klein, who, 
in his thesis, classified the semi-simple algebras using the method of group rings—in 
terms of matrices and then showed that his characters are the traces of the irreducible 
representations. Frobenius’s character theory found important applications in quantum 
mechanics and was used with great effect by Burnside, who wrote it up in the 1911 
edition of his Theory of Groups of Finite Order. 

Frobenius is also remembered as the originator of a series method for solving ordinary 
differential equations. Despite the clearly greater importance of his work in group theory, 
this method of Frobenius serves admirably to perpetuate his name. 


If we multiply D(x;) by }° x; one x at a time, we increase the power of 
one of the x;’s by one. If at any stage, two of the exponents become equal, 
the term must vanish, due to the antisymmetry of (}° xj) D(x j)- Therefore, 
as we raise the degree of the polynomial by one at each stage, the power of 
x, must be raised at least as fast as x2, and the power of x2 must be raised at 


25.2 Graphical Construction 


least as fast as x3, etc. Our goal is to raise the power of x; by 41, that of x2 
by A2, and, in general, the power of x; by 4;, making sure that at each stage, 
the number of multiplications by x; is greater than or equal to the number 
of multiplications by x2, etc. The total number of ways by which we can 
reach this goal will be x Me )? which is also the dimension of the irreducible 
representation (A) by Eq. (24.9). 

To see the argument more clearly, suppose that we are interested in the 
dimension of the irreducible representation of $4 corresponding to (3, 1). 
Then we must raise the power of x; by 3 and the power of x2 by 1; x3 and x4 
will remain intact, and therefore will not enter in the following discussion. 
It follows that D(x;) is to be multiplied by x?X2, one x-factor at a time, 
the number of x;-factors always exceeding the number of x2-factors. The 
possible ways of doing this are 


xR, x7 x2X1, XXQXx7. (25.8) 


Note that as we count the factors from left to right, the number of x,’s is 
always greater than or equal to the number of x2’s. Thus est is absent 
because x2 occurs without x; occurring first. It follows that the dimension 
of the irreducible representation (3, 1) is 3. 

A graphical way to arrive at the same result is to draw A, = 3 boxes on 
top and Az = | box below it: 


The next step is to fill in the boxes with numbers corresponding to the po- 
sition of x; (filling up the first row) and x2 factors (filling up the second 
row) in Eq. (25.8). Since in the first term of (25.8), the x;’s occupy the first, 
second, and third positions, we enter 1, 2, and 3 in the first row, and 4 in 
the second row corresponding to the last position occupied by x2. Similarly, 
in the second term of (25.8), the x;’s occupy the first, second, and fourth 
positions; therefore, we enter 1, 2, and 4 in the first row, and 3 in the second 
row corresponding to the position occupied by x2. Finally, in the last term 
of (25.8), the x;’s occupy the first, third, and fourth positions; therefore, we 
enter 1, 3, and 4 in the first row, and 2 in the second row corresponding to 
the position occupied by x2. The result is the graph shown below: 


1\|2||3 1||2|| 4 1)| 3), 4 
4 3 2 


Definition 25.2.1 Let (A) = (A, A2,...,An) be a partition of n. The 
Young frame (or the Young pattern) associated with (A) is a collec- 
tion of rows of boxes (squares) aligned at the left such that the first 
row has A, boxes, the second row Az boxes, etc. Since A; > Aj+1, the 
length of the rows decreases as one goes to the bottom of the frame. 


Young frame defined 
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The Young frame associated with (A) represents cae .: «xn , which mul- 
tiplies the antisymmetric polynomial D(x;). To find the dimension of the 
irreducible representation T“), we have to count the number of ways in 
which the x-factors can be permuted among themselves such that as we 
scan the product, the number of x;’s is never less than number of x;’s if 
j >i. This leads to 


Definition 25.2.2 A standard Young tableau (or diagram, or graph) is a 
Young frame filled with numbers | through n such that 


1. the numbers are placed consecutively left to right on the rows starting 
with 1 in the far-left box of the first row; 

2. no box of any row is to be filled unless all boxes to its left are already 
filled; 

3. at each stage, the number of boxes filled in any row is never less than 
the number of boxes filled in the rows below it. 


Tableaux satisfying the last condition are called regular graphs. 


It follows that in a Young tableau, the number | is always in the upper 
left-hand box, and that going down in a column, the numbers must increase. 


Theorem 25.2.3 Let (A) be a partition of n. Then the dimension of 
the irreducible representation T) is equal to the number of standard 
Young tableaux associated with (i). 


Example 25.2.4 We wish to calculate the dimension of each irreducible 
representation of $4. The partitions are (4), (3,1), (2,2), (2,1, 1), and 
(1, 1, 1, 1) whose associated Young frames are shown below: 


(4) (3,1) (2,2) (2,1,1) (1,1,1,1) 


The number of standard Young tableaux associated with (A) = (4) is 1, be- 
cause there is only one way to place the numbers | through 4 in the four 
boxes. Thus, the dimension of T™ is 1. For (A) = (3, 1), we can place 2 
either to the right of 1 or below it. The first choice gives rise to two possibil- 
ities for the placement of 3: Either to the right of 2 or below 1. The second 
choice gives rise to only one possibility for 3, namely to the right of 1. With 
1, 2, and 3 in place, the position of 4 is predetermined. Thus, we have 3 pos- 
sibilities for (A) = (3, 1), and the dimension of T°!) is 3. For (A) = (2, 2), 
we can place 2 either to the right of 1 or below it. Both choices give rise 
to only one possibility for 3: In the first case, 3 can only go under 1; in the 
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(4) 1 2 3 4 ng) =1 
12)3 1/[3)[4 1/[2 [4 13,1) =3 
(3,1) 
4 2 3 
12,2 =2 
(2,2) 2 1|| 3 (2,2) 
3 2114 
2 1/4 TI/3 22,1,1) = 3 
(2,1,1) 3 2, 2 
4 3 4 
1 1,1,1,1) = 1 
1,1,1,1 2 
GID 
4 


Fig. 25.1 The standard Young tableaux, and the dimensions of irreducible representa- 
tions of S4 


second case to its right. With 1, 2, and 3 in place, the position of 4 is again 
predetermined. Thus, we have 2 possibilities for (A) = (2, 2), and the di- 
mension of T°:)) is 2. The reader may check that the dimension of 7?!) 
is 3, and that of T!:!.) is 1, Figure 25.1 summarizes these findings. We 
note that the dimensions satisfy 17 + 37 + 27 + 3 + 1* = 24, the second 
equation of (24.18). 


25.3 Graphical Construction of Characters 


The product of the symmetric polynomial s,j) and the antisymmetric polyno- 
mial D(x;) contains all the information regarding the representations of S;,. 
We can extract the simple characters by looking at the coefficients of appro- 
priate products of the x-factors. This can also be done graphically. Without 
going into the combinatorics of the derivation of the results, we state the 
rules for calculating the simple characters, and examine one particular case 
in detail to elucidate the procedure, whose statement can be very confusing. 

As before, we label the irreducible representations with the partitions 
of n. However, we separate out the common factors in a cyclic structure, 
labeling the cycles by 1/1, /2, etc. For example, (2, 1?) has J; = 2, l2 = 1, 
and 13 = 1. So, (2, 17) becomes (2, 1, 1), and in general, we write (/) as 
(11, lo,...,lm). 


Definition 25.3.1 A regular application of 7 identical symbols to a Young 
frame is the placement of those symbols in the boxes of the frame as follows. 
Add the symbols to any given row, starting with the first (farthest to the 
left) unoccupied cell, until the symbols are all used or the number of filled 
boxes exceeds that of the preceding line by one. In the latter case, go to the 


regular application 
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preceding line and repeat the procedure, making sure that the final result of 
adding all r symbols will be a regular graph. If the number of rows occupied 
by the symbols is odd (even) the application is positive (negative). 


As an illustration, consider the regular application of five 2’s to the blank 
Young frame shown below. 


We cannot start on the first row because it does not have enough boxes for 
the five 2’s. We can start on the second row and put one 2 in the first box. 
This brings the number of 2’s in the second row to one more than in the 
first row; therefore, we should now go to the first row and put the rest of 
the symbols there. We could start at the third row, put one 2 in the first box, 
put a second 2 in the first box of the second row, and the rest in the first 
row. Altogether we will have 3 regular applications of the five 2’s. These are 
shown in the diagram below. 


2 |, 2 || 2|| 2 2 || 2 || 2 
2 


N 


N 
NIP MIPN |} 


Of these the first and the last tableaux are negative applications, and the 
middle one is positive. 


Theorem 25.3.2 The character of the irreducible representation T 
of the class (1) = (h, 12, ..., lm) is obtained by successive regular ap- 
plications of l, identical symbols (usually taken to be \’s), then Ig 
identical symbols of a different kind (usually taken to be 2’s), etc. The 
character ee. is then equal to the number of ways of building positive 
applications minus the number of ways of building negative applica- 
tions. 


The order in which the /;’s are applied is irrelevant. However, it is usually 
convenient to start with the largest cycle. 

The best way to understand the procedure is to construct a character table. 
Let us do this for S4. As usual, the rows are labeled by the various partitions 
(A) of 4. We choose the order (4), (3, 1), (2,2) = (27), (2, 1, 1) = (2, 1°), 
(1, 1, 1, 1) = (1%). The columns are labeled by classes (J) in the following 
order: (14), (2, 17), (27), (3, 1), (4), where, for example, (2, 17) means that 
1, = 2, 1, = 1, and /3 = 1. Example 25.2.4 gives us the first column of the 
character table. Similarly, the first row has 1 in all places, because it is the 
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trivial representation. Our task is therefore to fill in the rest of the table one 
row at a time. The second row, with (A) = (3, 1), has a Young frame that 
looks like 


and for each class (column) labeled (J, ..., lm), we need to fill this in with 
1, identical symbols (1’s), /2 identical symbols of a different kind (2’s), etc. 

The second column has /; = 2, J7 = 1 =13. So we have two 1’s, one 2, 
and one 3. If we start with the first row, the two 1’s can be placed in its 
first two boxes. If we start with the second row, the two 1’s must be placed 
vertically on top of each other. In the first case, we have two choices for 
the 2: Either on the first row next to the two 1’s, or on the second line. In 
the second case, we have only one choice for the 2: in the first row next 
to 1. With 1’s and 2 in place, the position of 3 is determined. The three 
possibilities are shown below: 


1]| 2 1), 14,3 1||2)|3 


The first two are positive applications, the third is negative because the 1’s 
occupy an even number of rows. We therefore have 


(3,1) 


Koy =tit1-1=41. 


The third column has /; = 2 = Jy. So we have two 1’s and two 2’s. We 
place the 1’s as before. When the two 1’s are placed vertically, we can put 
the 2’s on the first row and we are done. When the 1’s are initially placed 
in the first row, we have no way of placing the 2’s by regular application. 
We cannot start on the first row because there is only one spot available (re- 
member, we cannot go down once we start at a row). We cannot start on 
the second row because once we place the first 2, we are blocked, and the 
number of symbols in the second row does not exceed that of the first row 
by one. So, there is only one possibility: 


1|/1]/ 2 1/|2)| 2 
2 1 
Not allowed Allowed 


The only allowed diagram is obtained by a negative application of 1’s. 
Therefore, in =-—l. 

The fourth column has /; = 3 and /7 = 1. So we have three 1’s and one 2. 
There are two ways to place the 1’s: all on the first row, or starting on the 
second row and working our way up until all boxes are filled except the last 
box of the first row. The placement of 2 will be then predetermined. The 
result is the two diagrams shown below: 
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Table 25.2 Character table for Sy. The rows and columns are labeled by the partitions 
of 4 and cycle structures, respectively 


(14) (2, 17) (27) (3, 1) (4) 

T® 1 1 1 1 1 
TB) 3 1 -1 0 me 
T@) 2 0 2 -1 0 
TRY) 3 -1 -1 0 
TO) 1 -1 1 -1 

1|/1|} 1 1]| 1)|2 

2 1 


The first diagram is obtained by a positive application of 1’s, the second by 
a negative application. Therefore, 


(3,1) 
Xp =t1-1=0. 


Finally, for the last column, /; = 4. There is only one way to put all the 
1’s in the frame, and that is a negative application. Thus, ro. =-—l. 

Rather than going through the rest of the table in the same gory detail, 
we Shall point out some of the trickier calculations, and leave the rest of the 
table for the reader to fill in. One confusion may arise in the calculation of 


2: 
x a . The frame looks like this, 


and we need to fill this with two 1’s and two 2’s. The 1’s can go into the first 
row or the first column. The 2’s then can be placed in the second row or the 
second column. The result is 


The first diagram has no negative application. The second has two negative 
applications, one for the 1’s, and one for the 2’s. Therefore, the overall sign 
for the second diagram is positive. It follows that xo =+1+1=-442. 
The calculation of xo ) may also be confusing. We need to place four 
1’s in the frame. If we start on the first row, we are stuck, because there is 
room for only two 1’s. If we start in the second row, then we can only go 
up: Putting the first 1 in the second row causes that row to have one extra 
1 in comparison with the preceding row. However, once we go up, we have 
room for only two 1’s (we cannot go back down). So, there is no way we 


2 
can place the four 1’s in the (27) frame, and ro =0. 
The character table for S4 is shown in Table 25.2 (see Problem 24.15 
as well). The reader is urged to verify all entries not calculated above. The 


25.4 Young Operators 


Table 25.3 Character table for S5. The rows and columns are labeled by the partitions 
of 5 and cycle structures, respectively 


(1) (2, 13) @,1) (3, 2) (31°) 4,1) (5) 


TO 1 1 1 1 1 1 1 
TAD 4 2 0 = 1 0 -1 
T 3-2) 5 1 1 1 -1 -1 0 
TY) 6 0 =) 0 0 0 

TED 5 =] =1 = 0 
TRY) 4 =) 0 1 1 0 -1 
TO) 1 =i = 1 = 1 


character table for Ss can also be calculated with only minor tedium. We 
quote the result here in Table 25.3 and let the reader check the entries of the 
table. 


25.4 Young Operators 


The group algebra techniques of Sect. 24.5—which we used in our discus- 
sion of representation theory in a very limited way—provide a powerful and 
elegant tool for unraveling the representations of finite groups. These tech- 
niques have been particularly useful in the analysis of the representations of 
the symmetric group. Our emphasis on the symmetric group is not merely 
due to the importance of S;, as a paradigm of all finite groups. It has also 
to do with the unexpected usefulness of the representations of S,, in study- 
ing the representations of GL(V), the paradigm of all (classical) continuous 
groups. We shall come back to this observation later when we discuss rep- 
resentations of Lie groups in Chap. 30. 

To begin with, consider the element of the S, group algebra as defined 
in Eq. (24.20). Since multiplying P (on the left) by a group element does 
not change P, the ideal generated by P is not only one-dimensional, but all 
elements of S, are represented by the number 1. Therefore, the ideal AP 
corresponds to the (irreducible) identity representation. 

For S,,, there is another group algebra element that has similar properties. 
This is 


n!} 
O=) enti, Ti € Sp. (25.9) 
i=l 


The reader may check that 
mjQ=€,,0 and Q?=n1Q. 


As in the case of P, Q generates a one-dimensional ideal, but a left multipli- 
cation may introduce a minus sign (when the permutation is odd). Thus, the 
ideal generated by Q must correspond to the antisymmetric (or alternating) 
representation. 
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All the irreducible representations, including the special one-dimensional 
cases above, can be obtained using this group-algebraic method. We shall 
not give the proofs here, and we refer the reader to the classic book [Boer 63, 
pp. 102-125]. The starting point is the Young frame corresponding to the 
partition (A) = (Aj, ..., A). One puts the numbers | through v in the frame 
in any order, consistent with tableau construction, so that the end product is 
a Young tableau. Let p be any permutation of a Young tableau that per- 
mutes only the elements of each row among themselves. Such a p is called 
a horizontal permutation. Similarly, let g be a vertical permutation of 
the Young tableau. 


Definition 25.4.1 Consider the kth Young tableau corresponding to the par- 
tition (A). Let the Young symmetrizer PY and Young antisymmetrizer 


QW be the elements of the group algebra of S, defined as 
(A) (A) 
Pi =)0P. Ox = oeaa. 
Pp q 


Then, the Young operator y of this tableau, another element of the group 


algebra, is given by a — ov PY : 


It can be shown that the following holds. 


Theorem 25.4.2 The Young operator eee is essentially idempotent, 
and generates a minimal left ideal, hence an irreducible representa- 
tion for S,. Representations thus obtained from different frames are 
inequivalent. Different tableaux with the same frame give equivalent 
irreducible representations. 


In practice, one usually chooses the standard Young tableaux and applies 
the foregoing procedure to them to obtain the entire collection of inequiv- 
alent irreducible representations of S,. We have already seen how to cal- 
culate characters of S, employing both analytical and graphical methods. 
Theorem 25.4.2 gives yet another approach to analyzing representations of 
S,. For low values of n this technique may actually be used to determine the 
characters, but as n grows, it becomes unwieldy, and the graphical method 
becomes more manageable. 


Example 25.4.3 Let us apply this method to $3. The partitions are (3), 
(2, 1), and (13). There is only one standard Young tableau associated with 
(3) and (13). Thus, 


6 
1 1 
YO =P =e y sd a a a a a co 
‘j=l 


25.4 Young Operators 
3 *.. 1 1 
1 ‘ 
ae. = Dent) = GC — 2 — 3 — M4 + 5 + 76), 
j=l 


where we have divided these Young operators by 6 to make them idem- 
potent; we have also used the notation of Example 23.4.1. One can show 
directly that YOY (°) — 0. In fact, one can prove this for general S,, (see 
Problem 25.6). 

For the partition (2, 1), there are two Young tableaux. The first one has 
the numbers | and 2 in the first row and 3 in the second. In the second tableau 
the numbers 2 and 3 are switched. Therefore, using the multiplication table 
for S3 as given in Example 23.4.1, we have 


7 = 92) PP = (e—myle-+m)=e-+m— m— mo 
nore = ono pe? = (e —m2)(e +73) =e —272+ 03 — Ts. 


The reader may verify that the product of any two Young operators cor- 
responding to different Young tableaux is zero and that 


(2,1) (2,1) _ (2,1) (2,1) (2,1) __ (2,1) 
Ve ear. ry aye 


Let us calculate the left ideal generated by these four Young operators. 
We already know from our discussion at the beginning of this subsection that 
L®) and La), the ideals generated by Y° and yo), are one-dimensional. 
Let us find fia ) , the ideal generated by Y a ) This is the span of all vectors 


obtained by multiplying ye on the left by elements of the group algebra. 


It is sufficient to multiply Y sa by the basis of the algebra, namely the 
group elements: 


ae = ‘mae 


(2,1) _ 


m2Y, =m +e—m5—m =X”, 
mY?) =93+06-e-m=-Y”, 

2,1 2,1 2,1 
mY) = m4 +05 — 16 — 13 =—X' es, ) 


msYO) = m5 +04 — m2. -e =—-X”, 


ex =m +73 —-14—-15= xe - mee 


It follows from the above calculation that ie as a vector space, is 


spanned by eo, xX aa? and since these two vectors are linearly inde- 
pendent, cP ) is a two-dimensional minimal ideal corresponding to a two- 
dimensional irreducible representation of $3. One can use this basis to find 
representation matrices and the simple characters of $3. 

The other two-dimensional irreducible representation of $3, equivalent 
to the one above, is obtained by constructing the ideal ae generated by 
ye. This construction is left for the reader, who is also asked to verify its 
dimensionality. 
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The resolution of the identity is easily verified: 


1 1 3 1 1 1 2,1) 

ea -YO4 iy) 4 Aye 4 Ay@p_ 
6 6 a a 

Nees ee 
=e] =e2 =e3 =e4 


The e;’s are idempotents that satisfy e;e; = 0 fori F j. 


25.5 Products of Representations of S,, 


In the quantum theory of systems of many identical particles, the wave func- 
tion must have a particular symmetry under exchange of the particles: If the 
particles are all fermions (bosons), the overall wave function must be com- 
pletely antisymmetric (symmetric). Since the space of functions of several 
variables can provide a carrier space for the representation of any group, 
we can, in the case of S,, think of the antisymmetric (symmetric) functions 
as basis functions for the one-dimensional irreducible identity (alternating) 
representation. To obtain these basis functions, we apply the Young opera- 
tor Y") (or Y“) to the arguments of any given function of n variables to 
obtain the completely antisymmetric (or symmetric) wave function.! 

Under certain conditions, we may require mixed symmetries. For in- 
stance, in the presence of spin, the product of the total spin wave function 
and the total space wave function must be completely antisymmetric for 
Fermions. Thus, the space part (or the spin part) of the wave functions will, 
in general, have mixed symmetry. Such a mixed symmetry corresponds to 
some other Young operator, and the wave function is obtained by applying 
that Young operator to the arguments of the wave function. 

Now suppose that we have two separate systems consisting of n; and n2 
particles, respectively, which are all assumed to be identical. As long as the 
two systems are not interacting, each will consist of states that are classified 
according to the irreducible representations of its symmetric group. When 
the two systems interact, we should classify the states of the total system 
according to the irreducible representations of all 1; +72 particles. We have 
already encountered the mathematical procedure for such classification: It is 
the Clebsch-Gordan decomposition of the direct product of the states of the 
two systems. Since the initial states correspond to Young tableaux, and since 
we are interested in the inequivalent irreducible representations, we need to 
examine the decomposition of the direct product of Young frames into a 
sum of Young frames. We first state (without proof) the procedure for such 
a decomposition, and then give an example to illustrate it. 


Theorem 25.5.1 To find the components of Young frames in the product of 
two Young frames, draw one of the frames. In the other frame, assign the 
same symbol, say 1, to all boxes in the first row, the same symbol 2 to all 


'We must make the additional assumption that the permuted functions are all indepen- 
dent. 


25.5 Products of Representations of S,, 


boxes in the second row, etc. Now attach the first row to the first frame, and 
enlarge in all possible ways subject to the restriction that no two |’s appear 
in the same column, and that the resultant graph be regular. Repeat with the 
2’s, etc., making sure in each step that as we read from right to left and top 
to bottom, no symbol is counted fewer times than the symbol that came after 
it. The product is the sum of all diagrams so obtained. 


To illustrate the procedure, consider the product 


1/1 
819 


We have put two 1|’s in the first row and one 2 in the second row of the frame 
to the right. Now apply the first row to the frame on the left. The result is 


1\{ 1 1 1 


Now we apply the 2 to each of these graphs separately. We cannot put a 2 
to the right of the 1’s, because in that case, as we count from right to left, 
we would start with a 2 without having counted any 1’s. The allowed graphs 
obtained from the first diagram are 


1}/ 1 11 


Applying the 2 to the second graph, we obtain 


1 1 
1 1 
2 
and to the third graph gives 
1 1 
2 
1 1 
2 
Finally the last graph yields 
1 1 
1}|2 1 
2 
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® ®@@®@ 
I 
+ 


02) = 7 ot 


Fig. 25.2 Some products of Young frames for small values of n 


The entire process described above is written in terms of frames as 


© 


42 a + + 


Some simple products, some of which will be used later, are given in 
Fig. 25.2. 


25.6 Problems 


25.1 Construct the character table of S4 using the analytical method and 
Eq. (25.4). 


25.2 Find all the standard Young tableaux for Ss. Thus, determine the di- 
mension of each irreducible representations of S5. Check that the dimen- 
sions satisfy the second equation of (24.18). 


25.3 Verify the remaining entries of Table 25.2. 
25.4 Construct the character table of Ss. 


25.5 Suppose that Q, an element of the group algebra of S,, is given by 


n!} 


O=) €x,7, TT; € Sh. 


i=1 


25.6 Problems 


Show that 
mjQ=€,,Q and Q?=n1Q. 


25.6 Show that Y y") — 0. Hint: There are as many even permutations 
in S, as there are odd permutations. 


25.7 Show that the product of any two Young operators of S3 corresponding 
to different Young tableaux is zero and that 


(2,1) y(2,1) _ (2,1) (2,1) (2,1) _ (2,1) 
Pee ase. ener aa, 


25.8 Construct the ideal oe generated by ‘Aiea and verify that it is two 
dimensional. 


25.9 Using the ideal sae generated by Y ue find the matrices of the 
irreducible representation T@!), From these matrices calculate the simple 
characters of S3 and compare your result with Table 24.4. Show that the 
ideal ae generated by yor gives the same set of characters. 


25.10 Find all the Young operators for S4 corresponding to the first en- 
2: 
try of each row of Fig. 25.1. Find the ideals oe and Ee ) generated by 
2 
the Young operators yen and Y. - ) corresponding to the second and third 


2 
rows of the table. Show that ay and i ) have 3 and 2 dimensions, re- 
spectively. 


25.11 Verify the products of the Young frames of Fig. 25.2. 
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Tensors and Manifolds 


Tensors 2 6 


Until around 1970s, tensors were almost completely synonymous with (gen- 
eral) relativity except for a minor use in hydrodynamics. Students of physics 
did not need to study tensors until they took a course in the general theory of 
relativity. Then they would read the introductory chapter on tensor algebra 
and analysis, solve a few problems to condition themselves for index “gym- 
nastics”, read through the book, learn some basic facts about relativity, and 
finally abandon it (unless they became relativists). 

Today, with the advent of gauge theories of fundamental particles, the 
realization that gauge fields are to be thought of as geometrical objects, and 
the widespread belief that all fundamental interactions (including gravity) 
are different manifestations of the same superforce, the picture has changed 
drastically. 

Two important developments have taken place as a consequence: Ten- 
sors have crept into other interactions besides gravity (such as the weak and 
strong nuclear interactions), and the geometrical (coordinate-independent) 
aspects of tensors have become more and more significant in the study of all 
interactions. The coordinate-independent study of tensors is the focus of the 
fascinating field of differential geometry and Lie groups, the subject of the 
remainder of the book. 

As is customary, we will consider only real vector spaces and abandon 
the Dirac bra and ket notation, whose implementation is most advantageous 
in unitary (complex) spaces. From here on, the basis vectors! of a vector 
space V will be distinguished by a subscript and those of its dual space by 
a superscript. If {e}r_, is a basis in V, then {ef}, is a basis in V*. Also, 
Einstein’s summation convention will be used: 


Box 26.0.1 Repeated indices, of which one is an upper and the 
other a lower index, are assumed to be summed over: aj bi, means 


N kpi 
es bi. 


'We denote vectors by roman boldface, and tensors of higher rank by sans serif bold 
letters. 
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As a result of this convention, it is more natural to label the elements of a 
matrix representation of an operator A by a! (rather than a ;;), because then 


ole: 
Ae; = a; e;. 


26.1 Tensors as Multilinear Maps 


Since tensors are special kinds of linear operators on vector spaces, let us 
reconsider £(V, W), the space of all linear mappings from the real vector 
space V to the real vector space W. We noted in Chap. 5 that L(V, W) 
is isomorphic to a space with dimension dimV - dim W. The following 
proposition—whose proof we leave to the reader—shows this directly. 


Proposition 26.1.1 Let fe}, be a basis for V and fe}? a basis for W. 
Then 


1. the linear transformations 1, :V— W in the vector space £(V,W), 
defined by (note the new way of writing the Kronecker delta) 


The; =d/ey, f=l,...,Mii B=l,...,No, (26.1) 


form a basis in £(V, W). In particular, dim £(V, W) = N, No. 

2. If Te are the matrix elements of a matrix representation of a linear 
transformation T € £(V, W) with respect to the two bases above, then 
T=0°T. 


The dual space V* is simply the space £(V, R). Proposition 26.1.1 (with 
N2 = 1) then implies that dim V* = dimV, which was shown in Chap. 2. 
The dual space is important in the discussion of tensors, so we consider 
some of its properties below. _ 

When W = R, the basis {T3} of Proposition 26.1.1 reduces to {Ti} and is 
denoted by {e/} of , with N = dim V* = dim V. The €/’s have the property 
that 

e/(e;) =5/, (26.2) 
which is (26.1) with 6 = 1 and e, =e} = 1, a basis of R. Equation (26.2) 
was also established in Chap. 2. The basis B* = {e/ Ver is simply the dual 
of the basis B = {e}N ,- Note the “natural” position of the indices for B 
and B*. 

Now suppose that {f)} 1= B’ is another basis of V and R is the (invert- 
ible) matrix carrying B onto B’. Let B* = {g/ Yer be the dual of B’. We 
want to find the matrix that carries B* onto B’. If we denote this matrix by 
A and its elements by a} , we have 


ok =o"; = (ake!) (rJe;) =akr/ 5, = akr! = (AR), 


where the first equality follows from the duality of B’ and B”. In matrix 
form, this equation can be written as AR = 1, or A= R-!. Thus, 


26.1 Tensors as Multilinear Maps 


Box 26.1.2 The matrix that transforms bases of V* is the inverse of 
the matrix that transforms the corresponding bases of V. 


In the equations above, the upper index in matrix elements labels rows, 
and the lower index labels columns. This can be remembered by noting that 
the column vectors e; can be thought of as columns of a matrix, and the 
lower index i then labels those columns. Similarly, €/ can be thought of as 
rows of a matrix. We now generalize the concept of linear functionals. 


Definition 26.1.3 A map T: V, x V2 x--- x V,; Wis called r-linear if 
it is linear in all its variables: 


T(vi,...,avj +a’vi,...,v-) 
=aT(V1,...,Vi,-..,V) ta’T(v1,...,Vj,...5 Vr) 


for all i. 


We can easily construct a bilinear map. Let t; € Vj and tz € V5. We 
define the map T; ® T2: V} x V2 > R by 


T] @T2(V], V2) = T1(V1)T2(v2). (26.3) 


The expression T; ® T2 is called the tensor product of tT; and T2. Clearly, 
since T; and T2 are separately linear, so is T; ® T2. 

An r-linear map can be multiplied by a scalar, and two r-linear maps can 
be added; in each case the result is an r-linear map. Thus, the set of r-linear 
maps from V; x --- x V, into W forms a vector space that is denoted by 
L(V1,..., Vr; W). 

We can also construct multilinear maps on the dual space. First, we note 
that we can define a natural linear functional on V* as follows. We let t € V* 
and v € V; then t(v) € R. Now we twist this around and define a mapping 
v: V* > R given by v(t) = T(v). It is easily shown that this mapping is 
linear. Thus, we have naturally constructed a linear functional on V* by 
identifying (V*)* with V. 

Construction of multilinear maps on V* is now trivial. For example, let 
v; € V; and v2 € V2 and define the tensor product vj ® v2: Vj x V; > R 
by 


Vi @ v2(T1, T2) = Vi (T1)V2(T2) = T1 (V1 )T2(V2). (26.4) 


We can also construct mixed multilinear maps such as v@t:V* x V>R 
given by 


v@t(O,u) = v(6)t(u) = O(v)t(u). (26.5) 


There is a bilinear map h: V* x V > R that naturally pairs V and V*; it 
is given by h(@, v) =6(v). This mapping is called the natural pairing of V 
and V* into R and is denoted by using angle brackets: 


h(6, v) = (6, v) =4(v). 
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26 Tensors 


Definition 26.1.4 Let V be a vector space with dual space V*. Then a ten- 
sor of type (r, s) is a multilinear mapping 


Thi VE x VE xe x VEXVxX Vx: x VOR. 
A 


r times 5 times 


The set of all such mappings for fixed r and s forms a vector space denoted 
by J/(V). The number r is called the contravariant degree of the tensor, 
and s is called the covariant degree of the tensor. 


As an example, let v,,...,v, € V and t!,...,7° € V*, and define the 
tensor product T” = vj ®@---@v,@t!@---@r° by 


Vv, @---@v,@t' @---@r°(6',...,6",uy,..., us) 


=v,(0')...v-(6")r'(y)...t°(us) =] [] ] 'wat/ up. 


i=l] j=1 


Each v in the tensor product requires an element of V*; that is why the 
number of factors of V* in the Cartesian product equals the number of v’s 
in the tensor product. As explained in Chap. 1, the Cartesian product with s 
factors of V is denoted by V* (similarly for V*). 

A tensor of type (0, 0) is defined to be a scalar, so Jg(V) = R. A tensor 
of type (1, 0), an ordinary vector, is called a contravariant vector, and one 
of type (0, 1), a dual vector (or a linear functional), is called a covariant 
vector. A tensor of type (r, 0) is called a contravariant tensor of rank r, 
and one of type (0,5) is called a covariant tensor of rank s. The union 
of J{(V) for all possible r and s can be made into an (infinite-dimensional) 
algebra, called the algebra of tensors, by defining the following product on 
it: 


Definition 26.1.5 The tensor product of a tensor T of type (r,s) and a 
tensor U of type (k, /) is a tensor T @ U of type (r +k, s +1), defined, as an 
operator on (V*)"t* x VS+!, by 


T@U(O',...,0°** uy,..., us47) 
=T(6',...,6",u,...,us)UO"t!,...,07T* uss, ..., Us4). 


This product turns the (infinite-dimensional) vector space of all tensors into 
an associative algebra called a tensor algebra. 


This definition is a generalization of Eqs. (26.3), (26.4), and (26.5). It is 
easily verified that the tensor product is associative and distributive (over 
tensor addition), but not commutative. 

Making computations with tensors requires choosing a basis for V and 
one for V* and representing the tensors in terms of numbers (components). 
This process is not, of course, new. Linear operators are represented by ar- 
rays of numbers, i.e., matrices. The case of tensors is merely a generalization 
of that of linear operators and can be stated as follows: 


26.1 Tensors as Multilinear Maps 


Theorem 26.1.6 Let {fe}, be a basis in V, and fe}, a basis in 


V*, usually taken to be the dual of {e}_ ,- Lhen the set of all tensor 
products e;, ® ---®@ e;, @ €/! @--- @ €/* forms a basis for T.(V). 
Furthermore, the components of any tensor A € J‘ (V) are 


Sieg} j ji 
A= Ale ne AC ease; 3 


tool; 


Proof The proof is a simple exercise in employing the definition of tensor 
products and keeping track of the multilinear property of tensors. Details are 
left for the reader. 


A useful result of the theorem is the relation 


A= Al" ei, @--- Oe, BEN ®--- BEX. (26.6) 

Note that for every factor in the basis vectors of J/(V) there are N 
possibilities, each one giving a new linearly independent vector of the ba- 
sis. Thus, the number of possible tensor products is N’*’, and we have 
dim J’ (V) = (dim V)’TS, 


Example 26.1.7 Let us consider the special case of 7 (V) as an illustration. 
We can write A € fe (V) asA= A‘e; @€/. Given any v€ V, we obtain? 


A(v) = (A‘e: @e/)(v) = A‘e; [e/(v)]. 
—— 
eR 


This shows that A(v) € V and A can be interpreted as a linear operator on V, 
ie., Ae L(V). Similarly, for tT € V* we get 


A(t) = (Aie; @€/)(t) = A’ [ei(t) Je’. 
eR 


Thus, A € £(V*). We have shown that given A € AG (V), there corresponds a 
linear operator belonging to £(V) [or £(V*)] having a natural relation to A. 
Similarly, given any A € £(V) [or £(V*)] with a matrix representation in the 
basis {ei}, of V (or fe}, of V*) given by A‘., then corresponding to it 
in a natural way is a tensor in a (V), namely Ave; @e/. Problem 26.5 shows 
that the tensor defined in this way is basis-independent. Therefore, there is 
a natural one-to-one correspondence among a (V), £(V), and £(V*). This 
natural correspondence is called a natural isomorphism. Whenever there is 
a natural isomorphism between two vector spaces, those vector spaces can 
be treated as being the same. 


Here, we are assuming that A acts on an object (such as v) by “pairing it up” with an 
appropriate factor of which A is composed (such as €/). 
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We have defined tensors as multilinear machines that take in a vector 
from a specific Cartesian product space of V’s and V*’s and manufacture a 
real number. Given the representation in Eq. (26.6), however, we can gener- 
alize the interpretation of a tensor as a linear machine so that it takes a vector 
belonging to a Cartesian product space and manufactures a tensor. This cor- 
responds to a situation in which not all factors of (26.6) find “partners.” 
An illustration of this situation was presented in the preceding example. To 
clarify this, let us consider A € Ag (V), given by 


A= A',e Ge! BE. 


This machine needs a Cartesian-product vector of the form (tT, v;, V2), with 
t € V* and v1, v2 € V, to give a real number. However, if it is not fed 
enough, it will not complete its job. For instance, if we feed it only a dual 
vector T € V*, it will give a tensor belonging to TeV): 


A(t) = (Ai,e; @€! @E")(t) = Ai, [ei(t) Je WEF. 
If we feed it a double vector (v1, v2), it will manufacture a vector in V: 
A(vi, v2) = (Ajpe: @E/ @E*) (v1, v2) = Ae; [e/ (vi) [Ee (v2)] € V. 


What if we feed it a single vector v? It will blow its whistles and buzz 
its buzzers, because it does not know whether to give v to e/ or €* (it is 
smart enough to know that it cannot give v to e;). That is why we have 
to inform the machine with which factor of € to match v. This is done by 
properly positioning v inside a pair of parentheses: If we write (., v, .), the 
machine will know that v belongs to é/ , and (., ., v) tells the machine to pair 
v with e*. If we write (v, .,.), the machine will give us an “error message” 
because it cannot pair v with e;! 

The components of a tensor A, as given in Eq. (26.6), depend on the 
basis in which they are described. If the basis is changed, the components 
change. The relation between components of a tensor in different bases is 
called the transformation law for that particular tensor. Let us investigate 
this concept. 

We use overbars to distinguish among various bases. For instance, B = 


{ei}, B= ey, and B = {ex}, are three different bases of V. Simi- 


x iN B* _ pein Beh x 
larly, B* = {e'};_,,B = {7}, and B ={€ },_, are bases of V*. The 
components are also distinguished with overbars. Recall that if R is the ma- 
trix connecting B and B, then S=R7! connects B* and B. Fora tensor A 
of type (1, 2), Theorem 26.1.6 gives 


A = Ae’, ej, e) = A(sie”, rien, rpep) 


aigh gly P m _ oi .n).P am 
ae ee Ale €ns€p) = Sn h iT Anp: (26.7) 


This is the law that transforms the components of A from one basis to an- 
other. 


26.1 Tensors as Multilinear Maps 


In the classical coordinate-dependent treatment of tensors, Eq. (26.7) was 
the defining relation for a tensor of type (1, 2). In other words, a tensor of 
type (1, 2) was defined to be a collection of numbers, Aa that transformed 


to another collection of numbers A ik according to the rule in (26.7) when 
the basis was changed. In the modern treatment of tensors it is not necessary 
to introduce any basis to define tensors. Only when the components of a ten- 
sor are needed must bases be introduced. The advantage of the coordinate- 
free treatment is obvious, since a (1, 2)-type tensor has 27 components in 
three dimensions and 64 components in four dimensions, and all of these 
are represented by the single symbol A. However, the role of components 
should not be downplayed. After all, when it comes to actual calculations, 
we are forced to choose a basis and manipulate components. 

Since J (V) are vector spaces, it is desirable to investigate mappings 
from one such space to another. We will be particularly interested in linear 
mappings. For example, f : a, (Vv) > THCY) = R is what was called a linear 
functional before. Similarly, t : Ae (Vv) > J, (V) is a linear transformation 
on V. A special linear transformation is tr : a, (Vv) > THCY) = R, given by 


N 
trA =tr(Aie; @e/) = A=) Ai. 
i=1 


This is the same trace encountered in the study of linear transformations in 
Chap. 5. 

Although the above definition of the trace makes explicit use of com- 
ponents with respect to a basis, it was shown in Chap. 5 that it is in fact 
basis-independent (see also Problem 26.7). Functions of tensors that do not 
depend on bases are called invariants. Another example of an invariant is a 
linear functional (see Problem 26.6). 


Example 26.1.8 Consider the tensor A € TA(V) given by A=e; ®e; + 
€2 @ e;. We calculate the analogue of the trace for A: sa Ajj =1+0=1. 
Now we change to a new basis, {€;, €2}, given by ej = €; + 2€2 and e2 = 
—e; + 2. In terms of the new basis vectors, A is given by 


A= (e; + 2€2) @ (€; + 2€2) + (—e; + €2) ® (€; + 2€2) 
= 3@) ® ej + 6€2 Bd 


with aan Aji =0+6=64 ea Ajj. This kind of “trace” is not invari- 
ant. 


Besides mappings of the form h : J7(V) > he (V) that depend on a single 
variable, we can define mappings that depend on several variables, in other 
words, that take several elements of J’ (V) and give an element of v (V). 
We then write fh: (J/(V))”" > v (V). It is understood that h(t;,..., tn), 
in which all t; are in J{(V), is a tensor of type (k,/). If h is linear in all 
of its variables, it is called a tensor-valued multilinear map. Furthermore, 
if h(t,,..., tm) does not depend on the choice of a basis of J/(V), it is 


tensor-valued 
multilinear map 
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called a multilinear invariant. In most cases k = 0 = 1, and we speak of 
scalar-valued invariants, or simply invariants. An example of a multilinear 
invariant is the determinant considered as a function of the rows of a matrix. 


: The following defines an important class of multilinear invariants: 
contraction of a tensor 


Definition 26.1.9 A contraction of a tensor A € J{(V) with respect 
to a contravariant index at position p and a covariant index at position 
q is a linear mapping cr oa) == Te, (V) given in component form 
by 


Dp Feta ij..dp—tkip4t..ty a ij..dp—tkip4yi..ty 
[Cp in = Abie = DAA cdg abled 
k 


It can be readily shown that contractions are basis-independent. The 
proof is exactly the same as that for the basis-independence of the trace. 
In fact, the trace is a special case of a contraction, in which r= 5 = 1. 

By applying the contraction mapping repeatedly, we can keep reducing 
the rank of a tensor. For example, 


iyi ipekpy atkt dey lb all 
P2CP\(p Tpedp-2 ge py -1 KU py tb pg—1 lpg tl tr 
[c: ai ( eae Si dqy -VKigy +1 --Jqy—Wigg $1 ds’ 


where a sum over repeated indices k and / is understood on the right-hand 
side. Continuing this process, we get Cf"... C220! : T7(V) > TL" (V). 
In particular, if r = s, we have Cf” ...C/7C/) : T2(V) > R. In terms of 
components, we have 


cl... cla) = Ales 


iyi2...4p? 


for Ae Ty. ae are the components of A in any basis. This leads to a 


pairing of a tensor of type (r, 0) with a tensor of type (0, r). If A € J and 
Be Ain then A®Be7!, and the pairing (A, B) can be defined as 


(A,B) =C’...C5C}(A @B) = Al?" Bi, i (26.8) 


with Einstein’s summation convention in place. 
The pairing defined above can also be obtained from evaluation. Let 
{vi}, be a basis of V and {a’ ed , its dual basis in V*. Then 


A= AN?'Y, QV, @-- OV, B=Bjj...j,0!' @o” @---@alh 
and 
B(A) = Bj, j...j.0/' @@”? @---@a" (Airy, Vins cat) 
= Bin... AU? 0! BO” B+ OO! (Viz, Vins Vip) 


= Bie Gh Oa (v;, )w? (Vi,) .. or (Vi,) 
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BiG nd SA a (26.9) 

The linearity inherent in the construction of tensor algebras carries along 
some of the properties and structures of the underlying vector spaces. One 
such property is isomorphism. Suppose that F : V > U is a vector space 
isomorphism. Then F* : U* — V*, the pullback of F, is also an isomorphism 
(Proposition 2.5.5). Associated to F is a linear map—which we denote by 
the same symbol—from J7(V) to J; (U) defined by 


[F(T)](0',...,0",u1,..., us) 
eT;(U) 


=T(F*o',...,F*0",F-'u,...,F7'us), (26.10) 


where T € J7(V), 6’ € U*, and uj; € U. The reader may check that this map 
is an algebra isomorphism (see Definition 3.1.17). We shall use this iso- 
morhism to define derivatives for tensors in Chap. 28. 


26.2 Symmetries of Tensors 


Many applications demand tensors that have some kind of symmetry prop- 
erty. We have already encountered a symmetric tensor—the metric “ten- 


sor’ of an inner product: If V is a vector space and vj, v2 € V, then 
8(V1, V2) = g(V2, v1). The following generalizes this property. 


Definition 26.2.1 A tensor A is symmetric in the ith and jth variables if 
its value as a multilinear function is unchanged when these variables are 
interchanged. Clearly, the two variables must be of the same kind. 


From this definition, it follows that in any basis, the components of a 
symmetric tensor do not change when the ith and jth indices are inter- 
changed. 


Definition 26.2.2 A tensor is contravariant-symmetric if it is symmetric 
in every pair of its contravariant indices and covariant-symmetric if it is 
symmetric in every pair of its covariant indices. A tensor is symmetric if it 
is both contravariant-symmetric and covariant-symmetric. 


An immediate consequence of this definition is 


Theorem 26.2.3 A tensor S of type (r,0) is symmetric iff for any permuta- 


2 


tion 1 of 1,2,...r, and any t!,t ,...,T" in V*, we have 


S07) 7) gO) = S(01 Tp cces t") 


symmetric tensor 
defined 


contravariant- 
symmetric; 
covariant-symmetric; 
symmetric 
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The set S’(V) of all symmetric tensors of type (r, 0) forms a subspace of 
the vector space? Jo. Similarly, the set of symmetric tensors of type (0, s) 
forms a subspace 8, of ie The (independent) components of a symmetric 
tensor A € 8" are Aj,;,..i,, where ij) <i2 <--- <i,;; the other components 
are given by symmetry. 

Although a set of symmetric tensors forms a vector space, it does not 
form an algebra under the usual multiplication of tensors. In fact, even if 
A= Ale, @ e; and B= Bey ® e; are symmetric tensors of type (2, 0), the 
tensor product A ® B= AU Be; @ e; @ ex ®@ e; need not be a type (4, 0) 
symmetric tensor. For instance, A‘ B/! may not equal A BX’, However, we 
can modify the definition of the tensor product (for symmetric tensors) to 
give a symmetric product out of symmetric factors. 


; Definition 26.2.4 A symmetrizer is an operator S: J > 8” given by 
symmetrizer 


[S(A)](t',...,7’) = “ Ae nae), (26.11) 


where the sum is taken over the r! permutations of the integers 1,2,..., 7, 
and t!,...,7” are all in V*. S(A) is often denoted by Ay. 


Clearly, As is a symmetric tensor. In fact, 


Ae at SSO et) 


= STAle7, 27) 
5 — 
Tv 
1 
7 a) r) 
= Alero age) 
To 
SRC Teak (26.12) 


where we have used the fact that the sum over zr is equal to the sum over the 
product (or composition) zo, because they both include all permutations. 
Furthermore, if A is symmetric, then S(A) = A: 


1 1 
[S(A)](t!,..., 7”) = AEM, ..., 77) = = LACE! 8") 


=f! 


A similar definition gives the symmetrizer S : ae — Ss. Instead of 


1... t” in (26.11), we would have vj, ..., Vs. 


T 


3When there is no risk of confusion, we shall delete V from J/(V), it being understood 
that all tensors are defined on some given underlying vector space. 
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Example 26.2.5 For r = 2, we have only two permutations, and 
Ay(e.e2) = Sate! 2?) +8(r?,2')] 
For r = 3, we have six permutations of 1, 2, 3, and (26.11) gives 
A,(t!, 1,77) = (A(t! t?,t°)+A(t?,t!, 7°) +A(t', 2°, tr’) 
+ A(t’, Ce) +A(t?, rt’, t') +A(t*,7°, ri. 


It is clear that interchanging any pair of t’s on the RHS of the above two 
equations does not change the sum. Thus, A, is indeed a symmetric tensor. 


It can be shown that 


a BP N+r—1\_(W+r-—1)! 


The proof involves counting the number of different integers i;,...,i, for 
which | <i, <im+41 < N for each m. 

We are now ready to define a product on the collection of symmetric 
tensors and make it an algebra, called the symmetric algebra. 


Definition 26.2.6 The symmetric product of symmetric tensors A € 8’ (V) 
and B € S*(V) is denoted by AB and defined as 


] 
AB(r!,...,2°)= Ut sia @y(e!,....0°%) 
r!s! 
1 
= dd) (r) (r+1) (r+s) 
=a Ae gt BE ae), 
where the sum is over all permutations of 1,2,...,7 + 5. The symmetric 


product of A € 5,(V) and B € §,(V) is defined similarly. 


Historical Notes 

Leopold Kronecker (1823-1891) was the son of Isidor Kronecker, a businessman, and 
Johanna Prausnitzer. They were wealthy and provided private tutoring at home for their 
son until he entered the Liegnitz Gymnasium. At the gymnasium, Kronecker’s mathemat- 
ics teacher was E.E. Kummer, who early recognized the boy’s ability and encouraged him 
to do independent research. He also received Evangelical religious instruction, although 
he was Jewish; he formally converted to Christianity in the last year of his life. 
Kronecker matriculated at the University of Berlin in 1841, where he attended lectures 
in mathematics given by Dirichlet. Like Gauss and Jacobi, he was interested in classi- 
cal philology. He also attended Schelling’s philosophy lectures; he was later to make a 
thorough study of the works of Descartes, Spinoza, Leibniz, Kant, and Hegel, as well as 
those of Schopenhauer, whose ideas he rejected. 

Kronecker spent the summer semester of 1843 at the University of Bonn, and the fall 
semester at Breslau (now Wroclaw, Poland) because Kummer had been appointed pro- 
fessor there. He remained there for two semesters, returning to Berlin in the winter of 
1844-1845 to take the doctorate. Kronecker took his oral examination consisting of ques- 
tions not only in mathematics, but also in Greek history of legal philosophy. He was 
awarded the doctorate on 10 September 1845. 
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Dirichlet, his professor and examiner, was to remain one of Kronecker’s closest friends, 
as was Kummer, his first mathematics teacher. In the meantime, in Berlin, Kronecker 
was also becoming better acquainted with Eisenstein and with Jacobi. During the same 
period Dirichlet introduced him to Alexander von Humboldt and to the composer Felix 
Mendelssohn, who was both Dirichlet’s brother-in-law and the cousin of Kummer’s wife. 
Family business then called Kronecker from Berlin. In its interest he was required to 
spend a few years managing an estate near Liegnitz, as well as to dissolve the bank- 
ing business of an uncle. In 1848 he married the latter’s daughter, his cousin Fanny 
Prausnitzer; they had six children. Having temporarily renounced an academic career, 
Kronecker continued to do mathematics as a recreation. He both carried on independent 
research and engaged in a lively mathematical correspondence with Kummer; he was 
not ambitious for fame, and was able to enjoy mathematics as a true amateur. By 1855, 
however, Kronecker’s circumstances had changed enough to allow him to return to the 
academic life in Berlin as a financially independent private scholar. 

In 1860 Kummer, seconded by Borchardt and Weierstrass, nominated Kronecker to the 
Berlin Academy, of which he became full member on 23 January 1861. Kronecker was 
increasingly active and influential in the affairs of the Academy, particularly in recruiting 
the most important German and foreign mathematicians for it. His influence outside Ger- 
many also increased. He was a member of many learned societies, among them the Paris 
Academy and the Royal Society of London. He established other contacts with foreign 
scientists in his numerous travels abroad and in extending to them the hospitality of his 
Berlin home. For this reason his advice was often solicited in regard to filling mathemat- 
ical professorships both in Germany and elsewhere; his recommendations were probably 
as significant as those of his erstwhile friend Weierstrass. 

The cause of the growing estrangement between Kronecker and Weierstrass was partly 
due to the very different temperaments of the two, and their professional and scientific 
differences. Since they had long maintained the same circle of friends, their friends, too, 
became involved on both levels. A characteristic incident occurred at the new year of 
1884-1885, when H. A. Schwarz, who was both Weierstrass’s student and Kummer’s son- 
in-law, sent Kronecker a greeting that included the phrase: “He who does not honor the 
Smaller [Kronecker], is not worthy of the Greater [Weierstrass].” Kronecker read this 
allusion to physical size—he was a small man, and increasingly self-conscious with age— 
as a slur on his intellectual powers and broke with Schwarz completely. 

Kronecker’s mathematics lacked a systematic theoretical basis. Nevertheless, he was pre- 
eminent in uniting the separate mathematical disciplines. Moreover, in certain ways—his 
refusal to recognize an actual infinity, his insistence that a mathematical concept must 
be defined in a finite number of steps, and his opposition to the work of Cantor and 
Dedekind—his approach may be compared to that of intuitionists in the twentieth cen- 
tury. Kronecker’s mathematics thus remains influential. 


Example 26.2.7 Let us construct the symmetric tensor products of vectors. 
First we find the symmetric product of v; and v2 both belonging to V = 
TCV): 


2) 


(viva)(e!, £2) = vi(t!)va(e2) + vi (#?)v2(") 
=vi(t!)va(e?) + v2(t!)vi(?) 
= (Vj @V2+V2 @vi)(t', T°). 

Since this is true for any pair t! and t*, we have 
ViV2 =V1 ® V2 + V2 @ Vj. 


In general, Vj V2-++V; = 0, Vr(1) ® Vx(2) ® +++ @ Var): 


It is clear from the definition that symmetric seer erates is commu- 
tative, associative, and distributive. If we choose a basis {ej} , for V and 
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express all symmetric tensors in terms of symmetric products of e; using 
the above properties, then any symmetric tensor can be expressed as a linear 
combination of terms of the form (e;)”! ---(ey)”%. 

Skew-symmetry or antisymmetry is the same as symmetry except that 
in the interchange of variables the tensor changes sign. 


Definition 26.2.8 A covariant (contravariant) skew-symmetric (or anti- 
symmetric) tensor is one that is skew-symmetric in all pairs of covariant 
(contravariant) variables. A tensor is skew-symmetric if it is both covariant 
and contravariant skew-symmetric. 


covariant and 
contravariant 
skew-symmetric tensors 


The analogue of Theorem 26.2.3 is 


Theorem 26.2.9 A tensor A of type (r, 0) is skew iff for any permutation 1 
of 1,2,...r, and any ti r2,...,0" in V*, we have 


A(c7™) 272), 7) =e A(t! 2?,..., 0°). 


Definition 26.2.10 An antisymmetrizer is a linear operator A on J), given : . 
by antisymmetrizer 
1 
1 1 
[AM](t',...,c7) = a y (A | aia a Fe (26.14) 
1s 
A(T) is often denoted by Ty. 


Clearly, T, is an antisymmetric tensor. In fact, using (€,)? = 1, which 
holds for any permutation, we have 


Lt aan r7) = [A ](2°™, a rr) 


= (€9)?= » eg A(E7™™), ae qm) 
r! 
1s 


1 
=€, A ~~ egeatie an er) 
" to 


ed Cae eee a (26.15) 


where we have used the fact that €,€, = €z¢ as can be easily verified. If T 
is antisymmetric, then A(T) = T: 


AO Ont Hy ate cue) 


=e (x i) Te cit JST x55, 8")s CE16) 
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A similar definition gives the antisymmetrizer A on epee Instead of 


1... 0” in (26.14), we would have vj, ..., Vs. 


T 


Example 26.2.11 Let us write out Eq. (26.14) for r = 3. The procedure is 
entirely analogous to Example 26.2.5: 


T, (ea co) = = [eosA(e!. 2,23) + €23A(t?,t', 7°) 
+ €132A(t', rn, t’) + epA(t’, ti, t”) 
+ €391A(t?, *, t') + €931A(t?, t°, ')] 
1 


= glAltlst’, t°) _ A(t’, t,o) a A(t', r, £”) 


+ A(t?, tl, t’) _ A(r?, rt’, t') + A(t’, r, =). 


The reader may easily verify that all terms with a plus sign are obtained 
from (123) by an even number of interchanges of symbols, and those with a 
minus sign by an odd number. 


26.3 Exterior Algebra 


The following discussion will concentrate on tensors of type (r, 0). How- 
ever, interchanging the roles of V and V* makes all definitions, theorems, 
propositions, and conclusions valid for tensors of type (0, s) as well. 

The set of all skew-symmetric tensors of type (p, 0) forms a subspace of 
a, (V). This subspace is denoted by A?(V*) and its members are called p- 
vectors.’ It is not, however, an algebra unless we define a skew-symmetric 
product analogous to that for the symmetric case. This is done in the follow- 
ing definition: 


Definition 26.3.1 The exterior product (also called the wedge, Grass- 
mann, alternating, or veck product) of two skew-symmetric tensors A € 
A? (V*) andB € A4(V*) is askew-symmetric tensor belonging to A?T4 (V*) 
and given by> 


+5)! A @B)= (r+s)! 


rs! r!s! 


AAB= (A ®@ B)q. 


“The use of V* in A?(V*) is by convention. Since a member of A?(V*) acts on p dual 
vectors, it is more natural to use V*. 


>The reader should be warned that different authors may use different numerical coeffi- 
cients in the definition of the exterior product. 
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More explicitly, 
AAB(tl,...,7’**) 


= = € zA(t ee) Lees t7O)"\p(g7 er), ooh mee). 


rls! 


Example 26.3.2 Let us find the exterior product of v; and v2 both belong- 
ing to V= TCV), so thatr =s = 1: 


VLA v2(t', t”) = > en V1 (07) vo (47) 


4 


=v; (t')vo(t7) — vi(t7)vo(t') 


= (V1 ® V2 — v2 @vi)(t', T°). 
Since this is true for arbitrary t! and t*, we have 
Vi AV2=V1j @ V2 —W2 Ov, = 2!A(v] ® v2). 
The result of the example above can be generalized to 


VA- AV, =rA(y] @-:: @v,) = Yo ex ¥n(1) @---@vVxir). (26.17) 
Bg 


In ae this shows that the exterior product (of vectors) is associative. 
If {ej} ‘_, is a basis with dual {e’ pad ,- then Eq. (26.17) gives 


e) A Aen (el".. a ayy eine “AGAS) 
The last equality follows from the fact that the sum is zero unless ij, ..., ix 
is a permutation of 1,..., N and it is | if the permutation is even and —1 if 


it is odd. We obtain the same result if we switch the e’s and the €’s: 


e AAO" ]iais rein) = Dl endi Br) gt) Le, i. (26.19) 


IN EL aeves 


Another useful result is obtained when the indices of the last equation are 
switched: 


el A. Ae N(Q,... ew) = Dex ey ee 


Now note that 
5) ss ork eS ar eS BrP =o gy: 


Furthermore, )-, = )_,,-1 ande, =€,-1. Denoting 2! by o, the equation 
above gives 


el A. Ae NM (Q1,... en) =) lady Payee y= Cine (26.20) 
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The following theorem contains the properties of the exterior product (for 
a proof, see [Abra 88, p. 394]): 


Theorem 26.3.3 The exterior product is associative and distributive 
with respect to the addition of tensors. Furthermore, it satisfies the 
following anticommutativity property: 


AAB=(-1)"4BAA 


whenever A € A?(V*) and B € A7(V*). In particular, v, A v2 = 
Wa Vii for Vi,WE V. 


The wedge product of linear functionals of a vector space is particularly 
important in the analysis of tensors, as we shall see in the next chapter. 


Definition 26.3.4 The elements of A’ (V) are called p-forms. 


A linear transformation T: V — W induces a transformation® T* : 
AP(W) — A?(\V) defined by 


(T*p)(v1,....Vp) =e(Tvi,...,Tvp), pe AW), vie V. (26.21) 


T*p is called the pullback of by T. The most important properties of 
pullback maps are given in the following: 


Proposition 26.3.5 LetT: V— Wand$:W-— WU. Then 


T* : A?(W) => AP(YV) is linear. 

(SoT)*=T*oS*. 

IfT is the identity map, so is T*. 

If T is an isomorphism, so is T* and (T*)~! = (1!)*. 

If p € AP(W) anda € A4(W), then T*(p Ac) =T*p AT*s. 


Ae aie via el 


Proof The proof follows directly from definitions and is left as an exercise 
for the reader. 


If {e; Hey is a basis of V, we can form a basis for A? (V*) by constructing 
all products of the form e;, \e;, \--- A e;,. The number of linearly indepen- 
dent such vectors, which is the dimension of A?(V*), is equal to the number 
of ways p numbers can be chosen from among JN distinct numbers in such 
a way that no two of them are equal. This is simply the combination of N 
objects taken p at a time. Thus, we have 


dim A?(V*) = « ) = a (26.22) 
p) p\(N — p)! 


Note that T* is the extension of the pullback operator introduced at the end of Chap. 2. 
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In particular, dim A’ (V*) = 1. 
Any Ae A?(V*) can be written as 


N 
A= > A‘le Pe; N Cj N+ AG, 
ij <i2<-++<ip 
1 N 
=— So Ale, Ae, Av AGi, (26.23) 
*iy,i2,.., ip 


where A‘!-!p are the components of A, which are assumed completely anti- 
symmetric in all i}, i2,...,ip. In the second sum, all i’s run from | to N. Exterior algebra defined. 


Theorem 26.3.6 Set A°(V*) =R and let A(V*) denote the direct 
sum of all AP (V*): 


IOP B® ar(v' =ROVEA*(V*)@---@ AN (V*). 


Then A(V*) is a 2' -dimensional algebra with exterior product defin- 
ing its multiplication rule. 


Proof The only thing to prove is the dimensionality of the algebra, which is 
an easy consequence of Eq. (26.22) and the binomial expansion of (1+ 1): 


N N 


Mace =) ("ira 5 (7), 


p=0 p=0 . 


Given two vector spaces V and U, one can construct a tensor product 
of A(V*) and A(U*) and define a product © on it as follows. Let A; € 
APi(V*), i =1,2 and B; € AV (U*), j = 1,2. Then 


(A; © Bi) © (Az ® Bz) = (— 1) 74! (Ay A Az) ® (By A Bz). (26.24) 


Definition 26.3.7 The tensor product of the two vector spaces 
A(V*) and A(U*) together with the product given in Eq. (26.24) 
is called skew tensor product of A(V*) and A(U*) and denoted by 
A(V*)@A(U*). 


An elegant way of determining the linear independence of vectors using 
the formalism developed so far is given in the following proposition. 


Proposition 26.3.8 A set of vectors, V\,...,Vp, is linearly independent if 
and only if V1 A+++ AVp #0. 
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Proof Tf {vi} a ,; are independent, then they span a p-dimensional sub- 
space M of V. Considering M as a vector space in its own right, we have 
dim A? (M*) = 1. A basis for A? (M™*) is simply v; A---A Vp, which cannot 
be zero. 

Conversely, suppose that ov] +---++@ pV, = 0. Then taking the exterior 
product of the LHS with v2 A v3 A ---A v, makes all terms vanish (because 
each will have two factors of a vector in the wedge product) except the first 
one. Thus, we have a |v; \--- A Vv, =0. The fact that the wedge product is 
not zero forces a to be zero. Similarly, multiplying by vj A v3 A --- A Vp 
shows that a2 = 0, and so on. 


Example 26.3.9 Let {e;}%_, be a basis for V. Let vj = e; + 2e) — €3, v2 = 
3e; + e2 + 2e3, v3 = —e; — 3e2 + 2e3. 
Take the wedge product of the first two v’s: 
V1 A V2 = (€1 + 2e2 — €3) A (Be; + €2 + 203) 
= —S5e; Ae + 5e; Ae3 + 5e2 A e3. 
All the wedge products that have repeated factors vanish. Now we multiply 
by v3: 
Vj A V2 A V3 = —S5e] A €2 A (—e1 — 3€2 + 2€3) 
+ 5e; A e3 A (—e; — 3e2 + 2€3) 
+ Seo A e3 A (—e; — 3e2 + 2€3) 


= —10e; A e2 Ae3 — 15e; A e3 A e2 — 5€2 A 03 Ae; = 0. 


We conclude that the three vectors are linearly dependent. 
As an application of Proposition 26.3.8, let us prove the following. 


Theorem 26.3.10 (Cartan’s lemma) Suppose that {e,}P_ »P<dimV, form 
a linearly independent set of vectors in V and that {vi}P, are also vectors 
in V such that ae e; A vj = 0. Then all v; are linear combinations of only 
the set {e}?_,. Furthermore, if vj = ae Ajjej, then Ajj = Aji. 


Proof Multiplying both sides of ae e; A vj = 0 by 2 A--- Aep gives 
—wi Ae Ae A--- Ae, =0. 


By Proposition 26.3.8, vj and the e; are linearly dependent. Similarly, by 
multiplying a e; A v; = 0 by the wedge product with e,; missing, we 
show that vz and the e; are linearly dependent. Thus, vz = pas —1 Axiei, for 
all k. Furthermore, we have 


P Pp 


Pp 
O= doer Ave= >> dee A (Aniei) = > (Ani — Ander Ae, 
k=1 


k=l f=] k<i 


26.3 Exterior Algebra 


where the last sum is over both k and i with k <i. Clearly, {e, Ae;} with k < 
i are linearly independent (show this!). Therefore, their coefficients must 


vanish. 


Historical Notes 

Elie Joseph Cartan (1869-1951), born in Dolomieu (near Chambéry), Savoie, Rhéne- 
Alpes, France, became a student at the Ecole Normale in 1888 and obtained his doctor- 
ate in 1894. He lectured at Montpellier (1894-1896), Lyon (1896-1903), Nancy (1903- 
1909), and Paris (1909-1940). He had four children, one of whom, Henri Cartan, was 
to produce brilliant work in mathematics. Two others died tragically. Jean, a composer, 
died at the age of 25, while Louis, a physicist, was arrested by the Germans in 1942 and 
executed after 15 months in captivity. 

Cartan added greatly to the theory of continuous groups, which had been initiated by Lie. 
His thesis (1894) contains a major contribution to Lie algebras wherein he completed 
the classification of the semi-simple algebras that Killing had essentially found. He then 
turned to the theory of associative algebras and investigated the structure for these alge- 
bras over the real and complex fields. Wedderburn would complete Cartan’s work in this 
area. 

He then turned to representations of semisimple Lie groups. His work is a striking synthe- 
sis of Lie theory, classical geometry, differential geometry, and topology, which was to be 
found in all Cartan’s work. He also applied Grassmann algebra to the theory of exterior 
differential forms. 

By 1904 Cartan was turning to papers on differential equations, and from 1916 on he 
published mainly on differential geometry. Klein’s Erlanger Program was seen to be in- 
adequate as a general description of geometry by Weyl and Veblen, and Cartan was to 
play a major role. He examined a space acted on by an arbitrary Lie group of transfor- 
mations, developing a theory of moving frames that generalizes the kinematical theory of 
Darboux. 

Cartan further contributed to geometry with his theory of symmetric spaces, which have 
their origins in papers he wrote in 1926. It develops ideas first studied by Clifford and Cay- 
ley and used topological methods developed by Weyl in 1925. This work was completed 
by 1932 

Cartan then went on to examine problems on a topic first studied by Poincaré. By this 
stage his son, Henri Cartan, was making major contributions to mathematics, and Elie 
Cartan was able to build on theorems proved by his son. 

Cartan also published work on relativity and the theory of spinors. He is certainly one of 
the most important mathematicians of the first half of the twentieth century. 


Example 26.3.11 The symbol €;,;,..;,, called the Levi-Civita tensor, can 
be defined by 


el A--- AN Sei, ive Ao Ae. (26.25) 


In fact, substituting (e;,...,e1,) on both sides and using Eq. (26.20), the 
uniqueness theorem 2.6.4 proves the equality in (26.25). 

Now consider the linear operator E whose action on a basis {ej} 1 is to 
permute the vectors so that Ee; = e;,. Denote the left-hand side of (26.25) 
by A’ (a determinant function as defined in Chap. 2) and the N-form on the 
right-hand side by A. Now note that 


iy =detE- A’ =detE- &;,..;,A. 
Evaluate both sides on (e;,...,ey) and convince yourself that both deter- 


minant functions give 1. This yields 1 = detE - €;,.;,,. Since Ce =1, 
multiplying both sides by the Levi-Civita tensor, we get detE = €;,;,.iy- 
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Since the determinant is basis-independent, the result of the previous ex- 
ample can be summarized as follows: 


Box 26.3.12 The Levi-Civita tensor €j,i5..iy takes the same value in 
all coordinate systems. 


There is a generalization of A?(V) that is useful when we discuss Clif- 
ford algebras in Chap. 27: 


Definition 26.3.13 A U-valued p-form, is a linear machine that takes p 
vectors from V and produces a vector in U. The space of U-valued p-forms 
is denoted by A’?(V, U). In this new context, A?(V) = A?(V,R). 


26.3.1 Orientation 


The reader is no doubt familiar with the right-handed and left-handed co- 
ordinate systems in R?. In this section, we generalize the idea to arbitrary 
vector spaces. 


Definition 26.3.14 An oriented basis of an N-dimensional vector space is 
an ordered collection of N linearly independent vectors. 


If {vi} , 18 one oriented basis and {uj}r , 18 a second one, then 
WwAWA-:-:Auy = (detR)vj Ava A--: Avy, 


where R is the transformation matrix and detR is a nonzero number (R is 
invertible), which can be positive or negative. Accordingly, we have the fol- 
lowing definition. 


Definition 26.3.15 An orientation is the collection of all oriented bases 
related by a transformation matrix having a positive determinant. A vector 
space for which an orientation is specified is called an oriented vector space. 


Clearly, there are only two orientations in any vector space. Each oriented 
basis is positively related to any oriented basis belonging to the same ori- 
entation and negatively related to any oriented basis belonging to the other 
orientation. For example, in R? , the bases {e,, ey, e,} and {ey, ex, e,} belong 
to different orientations because 


ex Aey A€, = —€y Ay Ae. 


The first basis is (by convention) called a right-handed coordinate system, 
and the second is called a left-handed coordinate system. Any other basis 
is either right-handed or left-handed. There is no third alternative! 


26.4 Symplectic Vector Spaces 


Definition 26.3.16 Let V be a vector space. Let V* have the oriented basis 
{e'}\_|. The oriented volume element  ¢ A‘ (V) of V is defined as 
BSE RE Kae KE”, 

Note that if {e;} is ordered as {e/}, then w(e},@2,...,ev) = +1/N!, and 
we Say that {e;} is positively oriented with respect to w. In general, {v;} is 
positively oriented with respect to w if w(V1, V2,..., Vw) > 0. 

The volume element of V is defined in terms of a basis for V*. The reason 


for this will become apparent later, when we see that dx, dy, and dz forma 
basis for (R3)*, and dx dydz = dx Ady A dz. 


26.4 Symplectic Vector Spaces 


Mechanics was a great contributor to the development of tensor analysis. It 
provided examples of manifolds that went beyond mere subspaces of R”. 
The phase space of Hamiltonian mechanics is a paradigm of manifolds that 
are not “hypersurfaces” of some Euclidean space. We shall have more to 
say about such manifolds in Chap. 28. Here, we shall be content with the 
algebraic structure underlying classical mechanics. 


Definition 26.4.1 A 2-form @ € A2(V) is nondegenerate if w(v;, v2) = 0 
for all v; € V implies v2 = 0. A symplectic form on V is a nondegenerate 
2-form w € A(V). The pair (V,q@) is called a symplectic vector space. 
If (V,q@) and (W, p) are symplectic vector spaces, a linear transformation 
T: V > Wis called a symplectic transformation or a symplectic map if 
T*p=o. 


Any 2-form (degenerate or nondegenerate) leads to other quantities that 
are also of interest. For instance, given any basis {v;} in V, one defines the 
matrix of the 2-form w € A*(V) by jj = @(V;,Vv;). Similarly, one can 
define the useful linear map w’ : V > V* by 


[@’(v)| Vv =a(v, v). (26.26) 


The rank of w” is called the rank of w. The reader may check that 


Box 26.4.2 A 2-form w is nondegenerate if and only if the determi- 
nant of (@;;) is nonzero, if and only if @’ is an isomorphism, in which 
case the inverse of w? is denoted by w*. 


Proposition 26.4.3 Let V be an N-dimensional vector space and w € 
A?(V). If the rank of w is r, then r = 2n for some integer n and there exists 
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a basis {e;} of V, called a canonical basis of V, and a dual basis {e/}, such 
that m = ee e/ Ae/*", or, equivalently, the N x N matrix of w is given 
by , 


where 1 is then x n identity matrix and 0 is the (N — 2n) x (N — 2n) zero 
matrix. 


Proof Since w # 0, there exist a pair of vectors e;,e, € V such that 
w(e;, e,) # 0. Dividing e; by a constant, we can assume w(e1, e|) = 1. Be- 
cause of its antisymmetry, the matrix of w in the plane P; spanned by e; 
and e! is Cz ae Let V; be the w-orthogonal complement of Pj, i-e., 


Vi ={veV|a(v, v1) =0Vv; € P)}. 


Then the reader may check that P} NV; = 0. Moreover, V = P; + V, be- 
cause 


V=@(v, e) Je: — @(v, ee] + v — @(v, e} Jer + w(v, e1)e; 
a 


———— 
EP €V (Reader, verify!) 


for any v € V. Thus, V = P; @ V1. If @ is zero on all pairs of vectors in V1, 
then we are done, and the rank of @ is 2; otherwise, let e2, e, € V; be such 
that w(e2, e}) + (0). Proceeding as above, we obtain 


Vi =P2O0V2 > V=P1 OP20V2, 


where > is the plane spanned by e2, and e’, and V2 its w-orthogonal com- 
plement in V;. Continuing this process yields 


V=P1OP20---PPn O Vu, 


where Y,, is the subspace of V on which @ is zero. This shows that the rank 
of w is 2n. By reordering the basis vectors such that e, = e+, We construct 
a new basis {ej} , in which @ has the desired matrix. 

To conclude the proposition, it is sufficient to show that 2 1 e/ Nest", 
in which {e/ pe , is dual to {e} ,» has the same matrix as w. This is left as 
an exercise for the reader. 


We note that in the canonical basis, 


0 ifi,j<n, 
Oj7= on if fjs=n+k, k<n, 


0 if i > 2n or j > 2n. 


26.4 Symplectic Vector Spaces 

If we write ve V as v= D0" (xe; + venti) + oa Zj€on4i in the 
canonical basis of V, with a corresponding expansion for v’, then the reader 
may verify that 

n 
‘6 / / 
w(v,v’) => °(xiy} — x/yi). 
i=l 

The following proposition gives a useful criterion for nondegeneracy 

of @: 


Proposition 26.4.4 Let w be a 2-form in the finite-dimensional vector 
space V. Then w is nondegenerate iff V has even dimension, say 2n, and 
oo” =w/---Aq@ is a volume element of V. 


Proof Suppose @ is nondegenerate. Then, w? is an isomorphism. Therefore, 
the rank of w, an even number by Proposition 26.4.3, must equal dim V* = 
dim V. Moreover, by taking successive powers of w and using mathematical 
induction, one can show that w" is proportional to €! A --- A€?”. 
Conversely, if @” «xe! A --- A €*” is a volume element, then by Propo- 
sition 26.3.8, the {e/} are linearly independent. Furthermore, dim V* must 
equal the number of linearly independent factors in the wedge product of 
a volume element. Thus, dim V* = 2n. But 2n is also the rank of w. It fol- 
lows that w? is onto. Since V is finite-dimensional, the dimension theorem 
implies that w” is an isomorphism. 


Example 26.4.5 Let V be a vector space and V* its dual. The direct sum 
V@V* can be turned into a symplectic vector space if we define w € A7(V® 
V*) by 

o(v+¢.v +9’) =9'(v) —¢(v), 
where v, v’ € V and 9, g’ € V*. The reader may verify that (V @ V*,@) is 
a symplectic vector space. This construction of symplectic vector spaces 


is closely related to Hamiltonian dynamics, to which we shall return in 
Chap. 28. 


Suppose (V, @) and (W, p) are 2n-dimensional symplectic vector spaces. 
Then, by Proposition 26.3.5, any symplectic map T: (V,w) — (W, p) is 
volume-preserving, i.e., (T*)” is a volume element of W. It follows that the 
rank of T* is 2, and by Proposition 2.5.5, so is the rank of T. The dimension 
theorem now implies that T is an isomorphism. Symplectic transformations 
on a single vector space have an interesting property: 


Proposition 26.4.6 Let (V,@) be a symplectic vector space. Then the set of 
symplectic mappings T: (V,@) — (V,@) forms a group under composition, 
called the symplectic group and denoted by Sp(V,@). 


Proof Clearly, Sp(V, @) is a subset of GL(V). One need only show that the 
inverse of a symplectic transformation is also such a transformation and that 
the product of two symplectic transformations is a symplectic transforma- 
tion. The details are left for the reader. 
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A matrix is called symplectic if it is the representation of a symplec- 
tic transformation in a canonical basis of the underlying symplectic vector 
space. The reader may check that the condition for a matrix A to be sym- 
plectic is A‘ JA = J, where J is the representation of @ in the canonical basis: 


0 1 
=(2, 9) 


where 1 and 0 are the n x n identity and zero matrices, respectively. 


26.5 Inner Product Revisited 


The inner product was defined in Chap. 2 in terms of a metric function that 
took two vectors as input and manufactured a number. We now know what 
kind of machine this is in the language of tensors. 


Definition 26.5.1 A symmetric bilinear form b on V is a symmetric ten- 
sor of type (0, 2). 


If {e ee is a basis of V and {e'}_, is its dual basis, then b = }bjje'e/ 
(recall Einstein’s summation convention), because e'e/ = €' @e/ +€/ @e! 
form a basis of 52(V). If v and u are any two vectors in V, then 


b(v, u) = sbisle! Ge! te! BE!) (vEex, wen) 
1 k  mf[_i j j i 

= shiv [e' (exe! (em) + €/ (ex Je’ (em) | 

= = Shyu mist od, +8) 8%,] 

= Shij(v'u! + v: jy!) = bjju' ‘ys, (26.27) 
For any vector v € V, we can write 

b(v) = sbiye! ‘el (v) = sbile! Bel +e! Be!) (vkex) 
= i; ele! (e,) +e/e'(ex)] = ie Teis/ +e/5!] 
=5 ijU ex eve (&x ="5 ijU ke é€ k 
= sbilv'e’ t+ v'e/] =bjvle' =bijv'e’. (26.28) 

Thus, b(v) € V*. This shows that b can be thought of as a mapping from 
V to V*, which we denote by b,. and write b,. : V > V*. For this mapping 
to make sense, it should not matter which factor in the symmetric product v 


contracts with. But this is a trivial consequence of the symmetries bj; = Dj; 
and e'e/ =e/e’, 


26.5 Inner Product Revisited 


Let v and u be any two vectors in V. Let fej}, be a basis of V and 


{e! yo , its dual basis. The natural pairing of v and b,.(u) is given by 
(b.. (a), v) = (bjjuse!, vSex) = biju! v*(e', ex) 
= bjju! v*8{, = biju/v! = b(u, v) = b(v, u), (26.29) 


where we used (26.27) in the last step. 
The components bjijv! of b,,(v) in the basis {e! Ee , of V* are denoted by 
vj, SO 
b,.(v) =v;e', where vj = bijv’. (26.30) 
We have thus /owered the index of v/ by the use of the symmetric bilinear 
form b. In applications v; is uniquely defined; furthermore, there is a one- 
to-one correspondence between v; and v!. This can happen if and only if the 
mapping b, : V > V* is invertible, in which case b is usually denoted by g. 


If g, is invertible, there must exist a unique (g,)~! = (g~!)« : V* > V, or 
g! € 82(V*) = §?(V), such that 


ve; =v=(g,) 'g,(v) = (9,) | (vie') = v1,) | (€') 
= vi[(g!)"ejex](€") = vi(g!)" ej ec(€') = vi(g!)"e;. 
an rat 
=5 


Comparison of the LHS and the RHS yields v/ = v;(g~!)/“. It is customary 
to omit the —1 and simply write 


v/ =g!'v;, (26.31) 


where it is understood that g with upper indices is the inverse of g (with 
lower indices). 


Definition 26.5.2 An invertible bilinear form is called nondegenerate. 
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uct. When there is no danger of confusion, we write (u, v) instead of g(u, v). 


We therefore see that the presence of a nondegenerate symmetric bilinear 
form (or an inner product) naturally connects the vectors in V and V* in a 
unique way. For any vector v € V there is a unique linear functional @, € V* 
given by ¢, = g,.(v). One can therefore identify V and V*. An inner product 
makes a vector space self-dual. In particular, Proposition 5.5.12 shows that 
there exists a determinant function Ag such that’ 


Ao(v1,.--, Vw) Ag(y,...,Uy) = a det(g(vi, u;)). (26.32) 


This is called the Lagrange identity. 

Going from a vector in V to its unique image in V* is done by simply 
lowering the index using Eq. (26.30), and going the other way involves 
using Eq. (26.31) to raise the index. This process can be generalized to 


7See also Problem 5.37. 
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all tensors. For instance, although in general, there is no connection among 
Ar (V), de (V), and T3(V), the introduction of an inner product connects all 
these spaces in a natural way and establishes a one-to-one correspondence 
among them. Thus, to a tensor in a (V) with components t!/ there corre- 
sponds a unique tensor in 7 (V), given, in component form, by ti a gjxti*, 
and another unique tensor in TeV), given by t;; = git = gig jkt!®. 

Let us apply this technique to g'/, which is also a tensor and for which 
the lowering process is defined. We have 


‘ ‘ _ vik ‘ 
8} = ging! = (g "yt Skj => 5}. (26.33) 


This relation holds, of course, in all bases. 
The inner product has been defined as a nondegenerate symmetric bilin- 
ear form. The important criterion of nondegeneracy has equivalences: 


Proposition 26.5.3 A symmetric bilinear form g is nondegenerate if 
and only if 


1. the matrix of components gi; has a nonvanishing determinant, or 
2. for every nonzero v € V, there exists w € V such that g(v, w) 4 0. 


Proof The first part is a direct consequence of the definition of nondegen- 
eracy. The second part follows from the fact that g, :V— V* is invert- 
ible iff the nullity of g, is zero. It follows that if v € V is nonzero, then 
g,.(v) £0, ie., g,.(v) is not the zero functional. Thus, there must exist a 
vector w € V such that [g,.(v)](w) 4 0. The proposition is proved once we 
note that [g,.(v)](w) = g(v, w). 


Let (V,g) and (U, h) be inner product spaces. Recall that an isometry is 
a linear transformation T : V — U which preserves the inner product, i.e., 


g(V1, V2) =h(Tvj, Tv2). 


It was shown in Theorem 2.3.12 that an isometry is injective. However, the 
proof relied on the positive definiteness of the inner product. That is not 
necessary. In fact, suppose that Tv = 0. Then, for any x € V, we have 


g(x, v) = h(1x, Tv) = h(Tx, 0) = 0. 


By Proposition 26.5.3, v = 0 since g is nondegenerate. It follows that 
ker T = {0}, and we have 


Theorem 26.5.4 A linear isometry is injective for all inner products. 


Definition 26.5.5 The g-transpose of a linear endomorphism T : V > V is 
the endomorphism T’ given by 


g(T’u, v) = g(u, Tv). 


26.5 Inner Product Revisited 
If T is an isometry, then 
g(u, v) = g(Tu, Tv) = g(T'Tu, y). 
Since this holds for arbitrary u and v, we must have T'T =1. Thus, 


Proposition 26.5.6 An endomorphism T : V — V is an isometry if and only 
ifT =T"!. 


Proof Itis easy to show that if T’ = T~!, then T is an isometry. We have also 
shown that if g(u, v) = g(Tu, Tv), then T’T = 1. This last relation by itself 
does not imply that T has an inverse. However, if V is finite dimensional, 
then it does. (See Problem 5.25.) 


Definition 26.5.7 A symmetric bilinear form b can be categorized as fol- 
lows: 


1. positive (negative) definite: b(v,v) > 0 [b(v,v) < 0] for every 
nonzero vector V; 

definite: b is either positive definite or negative definite; 

positive (negative) semidefinite: b(v, v) > 0 [b(v, v) < 0] for every v; 
semidefinite: b is either positive semidefinite or negative semidefinite; 
indefinite: b is not definite. 


ie aS 


If b is a symmetric bilinear form on V, then the restriction b|w of b ona 
subspace W is also symmetric and bilinear, and if b is definite or semidefi- 
nite, then so is b| jw. 


Definition 26.5.8 The index v of a symmetric bilinear form b on V is the 
dimension of the largest subspace W of V on which b] w is negative definite. 
Sometimes v is referred to as the index of V. 


Example 26.5.9 Some of the categories of the definition above can be il- 
lustrated in R? with vy = (x1, y1), V2 = (x2, y2), and v= (x,y). 


(a) Positive definite: b(v,, v2) = x1x2 + y1 y2 because if v 4 0, then one 
of its components is nonzero, and b(v, v) = x24 y? > 0. 
(b) Negative definite: b(v,, v2) = 5(x1y2 + x21) — x1x2 — y1 y2 because 


1 1 
b(v, v) =xy x? y= 5 y)? x? Ae 


which is negative for nonzero v. 

(c) Indefinite: b(v,, v2) = x1x2 — y, yo. For v = (x, x), b(v, v) = 0. How- 
ever, b is nondegenerate, because it has the invertible matrix g = 
() ) in the standard basis of R?. 

(d) Positive semidefinite: b(v1, v2) = x1x2 => b(v, v) = x? and b(v, v) 
is never negative. However, b is degenerate because its matrix in the 


standard basis of R? is b= (i bs which is not invertible. 
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Let g be an inner product on V. Two vectors u,v € V are said to be 
g-orthogonal if g(u, v) = 0. A null or isotropic vector of g is a vector that 
is g-orthogonal to itself. If g is definite, then the only null vector is the zero 
vector. The converse is also true, as the following proposition shows. 


Proposition 26.5.10 If g is not definite, then there exists a nonzero 
isotropic vector. 


Proof That g is not positive definite implies that there exists a nonzero vec- 
tor v € V such that g(v, v) < 0. Similarly, that g is not negative definite 
implies that there exists a nonzero vector w € V such that g(w, w) > 0. 
Construct the vector u= av + (1 — a@)w and note that g(u, w) is a contin- 
uous function of a. For a = 0 this function has the value g(w, w) > 0, and 
for a = | it has the value g(v, v) < 0. Thus, there must be some a for which 
g(u, u) = 0. 


Example 26.5.11 In the special theory of relativity, the inner product of 
two “position” four-vectors, ry = (x1, y1, Z1, ct) and rg = (x2, yo, Z2, Clo), 
where c is the speed of light, is defined as 


2 
g(r, 2) = —x1x2 — yiy2 — 2122 + CNH M2. 


This is clearly an indefinite symmetric bilinear form. Proposition 26.5.10 
tells us that there must exist a nonzero null vector. Such a vector r satisfies 


gir) =r? —x?-y?- 2 =0, 
or 


© oe as oe __ distance 


5) ee +y? 427 
(6 —) = 9-3 cH=a Fi 
t t time 


This corresponds to a particle moving with the speed of light. Thus, light 
rays are the null vectors in the special theory of relativity. 

Considering the four-vectors as a generalization of three-vectors, it is 
more natural to define the inner product as g(11, r2) = x1x2 + yi y2 +2122 — 
c?t}t2, so that the Euclidean part remains positive and only the added 4th 
dimension carries the negative sign. Both practices are common in physics, 
and we shall use both of them in the book. 


As in Chap. 4, we define the component of a vector along another vector 
and the reflection of the former in a plane perpendicular to the latter. 


Definition 26.5.12 Let g be an inner product on V and y a non-null (non- 
isotropic) vector in V. The projection x, of x along y and the reflection 
X,;,y Of x in a plane perpendicular to y are given by 


g(x, y) 
g(y.y)" 


- g(x, y) 
gly, y) 


Xy y and x,y=x-—2 
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We can also introduce operators, as we did in Chap. 4. The bra and ket 
notation was suitable for the projection operators. However, we still can 
construct projection and reflection operators. In fact, using g,, and recalling 
that @, = g,(y) is the linear functional such that ,(x) = g(x, y), we can 
define 
gy (26.34) 


P, 


— ) A 
gty.y) 
and verify that 


1 1 1 1 
°-babatbadah) setae ats® 
>= Vag ne) Vag n”) aan ag.n® 


1 1 1 
=¥ g(y.y) dy=y_ by =Py. 
gly, y) gty.y)” “gty.y) >? 
From this we obtain the reflection operator 
Ry = 1— 2P, = 1— 2y———@,. (26.35) 
g(y.y) 


The reflection operator has the property Ry = 1, or Ry! = Ry, as expected. 
Furthermore, one can easily show that 


g(x, y)g(Z, y) 


,P = , Py = 
g(x, Pyz) = g(z, P)x) ay.) 


indicating that P, is symmetric (..e., P’, = Py). It follows that Ry is also 
symmetric and 


1=R;,=R\R,, (26.36) 


ie., that Ry is an isometry by Proposition 26.5.6. 


26.5.1 Subspaces 


Let V be a vector space with inner product g. Let W be a subspace of V. 
Let W+ be all vectors in V which are g-orthogonal to all vectors in W. 
Ordinarily, we would call W+ the orthogonal complement of W, but if g 
is not definite, we can’t. Here is why: In Example 26.5.11, eliminate the y 
and z coordinates and consider two-dimensional vectors (x, ct). Now let W 
be the span of any null vector. Then clearly W = W+, and W+ does not 
complement W. Nevertheless, we have the following 


Lemma 26.5.13 Let W be a subspace of a finite-dimensional inner product 
space V. Then 


(1) dimW-+dimW+ = dimV. 
(2) (Wtyt=W. 
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Proof (1) Let {e;}""_, be a basis of W with the dual basis {el ;_1- Consider 


i= 


the linear operator gy : V > W* given by 


Gw(v) =) gv, ee’. 


i=1 


It is not hard to show that gy is onto. Using the dimension theorem (Theo- 
rem 2.3.13), we can write 


dim W* + dimker gy = dimV. 


Since dim W* = dimW, all that is left to show is that dimkergy = 
dim W-. We show more than that; we prove that kergy = W+. In fact, 


m 
vekergy <> Y\gtv, eve’ =0 


i=1 


& > g(v,e)=0, i=1,2,...,m. 


The last equality follows from the linear independence of {e! yt,» and it 
holds if and only if ve W+. 

(2) If v € W, then v is orthogonal to all vectors in Wt, i.e., ve (W+)+. 
Thus, W c (W+)+. Applying (1) to the subspace W+, we get dim W+ + 
dim(W+)+ = dim V. Hence, dim W = dim(W-+)+, and W = (W+)+. 


A subspace W of an inner product space (V,g) is called nondegener- 
ate if g|\w is nondegenerate. When g is definite, any subspace of V inherits 
a definite inner product. Therefore, in this case every subspace is nonde- 
generate. However, when g is not definite, there will always be degenerate 
subspaces. For example, if v is null, then the span of v is clearly degenerate. 


Proposition 26.5.14 A subspace W of an inner product space V is nonde- 
generate if and only ifW@® Wt =V. 


Proof Clearly, g|w is nondegenerate if and only if WN W+ = 0, because if 
there were 0-4 w € WN W1, it would have to be orthogonal to all vectors 
in W, making g|w degenerate. From Problem 2.8, we have 


dim(W + W+) + dim(WN W+) = dim W + dim W* = dimV, 


where in the last step, we used (1) of Lemma 26.5.13. Therefore, dim(W + 
W-) = dimV if and only if W is nondegenerate. Since W + W*- is a sub- 
space of V, we get W+ W+ = V if and only if W is nondegenerate. The last 
sum is actually a direct sum because of the first statement in the proof. 


An immediate consequence of this proposition and (2) of Lemma 26.5.13 
is 


Corollary 26.5.15 A subspace W of an inner product space is nondegen- 
erate if and only if W+ is nondegenerate. 


26.5 Inner Product Revisited 


Equation (26.36) tells us that a reflection is an isometry. Is an isometry 
also a reflection? This question is not without merit. In fact, Example 4.4.8 
established a connection between reflections and isometries in R*. Geomet- 
ric reasonings also make a connection between reflections and isometries. 
Take a vector in three dimensions and apply an isometry to it. The resulting 
vector will have the same length. Draw the two vectors from the same point. 
Find the difference between the two vectors (connect the tips of the two ar- 
rows). Construct the perpendicular bisector plane of this vector. Clearly the 
vector and its isometric image are reflections of one another in this plane. 
Although we have constructed a reflection from the isometry, its construc- 
tion depends on the vector on which the isometry acts (see Problem 26.28). 
Is it possible to find a general dependence between an isometry and reflec- 
tions not involving any vector? The theorem to follow makes the relation 
more explicit, but first we need a lemma: 


Lemma 26.5.16 Let x and y be two vectors in V such that g(x, x) = 
g(y, y) £0. Then there is a reflection R such that R(x) = +y. 


Proof Because of the relation 


g(xt+y,x+y)+g(xk—y,x—y) =4g(x, x) £0, 


at least one of the terms on the left is nonzero. Assume that g(x — y, 
x — y) £0, and let z= x — y. Then the reflection operator 


1 
R. = 1—2P, = 1 — 2z—_@ 
Z z aa 


is such that 


2 gx) | g(x —y,x) 
Re) Re a) Gay. xy) 
g(x, x) — g(x. y) 


2gx,.x)—29%.y) 


=x—2(x—-y) 


If g(x+y,x+y) 40, then let z=x-+y. The reflection operator R,, 
when acting on x, yields 
R.(x) =x 7 9%™ _ x 2x+ jer 
g(Z, Z) g(x+y,x+y) 
9% xX) +9%y) _ 
2g(x, x) + 2g(x, y) 


=x—2(x+y) 


’ 


and the proof of the lemma is complete. 


Theorem 26.5.17 Let V be an N-dimensional inner product space. 
Then any isometry T of V is the product of at most N + | reflections. 
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Proof We prove the theorem by induction on the dimension of V. For NV = 1 
and x € V, we have Tx = +x. Since R = —1 for any reflection operator in 
one-dimension, we see that T = R for the negative sign, and T = R? for the 
positive sign. Hence, T is the product of at most two reflections. 

Now let T be an isometry of V. Choose a vector x such that g(x, x) 4 0, 
and set y = Tx. Since g(y, y) = g(x, x), by Lemma 26.5.16, there exists a 
reflection R such that Rx = ty. Set T; = Ro T and note that 


T)x =Ro Tx = Ry =+R*x= 4x. 


So T; leaves Span{x} invariant. By Propositions 2.3.17 and 2.3.21, it leaves 
V1, the orthogonal complement of Span{x}, also invariant. Since Span{x} is 
non-degenerate, so is Vj by Corollary 26.5.15. Furthermore, since V; has 
dimension N — 1, the induction hypothesis applies to it, and we can write 


RoT=T,; =R,R2...Ry, (26.37) 


where each R; is a reflection in V1. 

Now, each reflection in V; extends to a reflection in V. In fact, if R,, 
is defined for z; € V,, and v= vy; + ax is any vector in V, define R,,v = 
R.,v; + ax. Then 


R..v=v — 27) g(@1, vi) 
g(Z1, 21) 
=v, +ax 27, 9a 0%) =v—2z) g@1,¥) , 
g(Z1, 21) g(Z1,Z1) 


because g(z), x) = 0. This shows that R,, is a reflection in V. Multiplying 
both sides of (26.37) by R and noting that R* = 1, we get 


T=R* oT=RoT, =RR|R>...Ry. 


Thus T is the product of R and N other reflections. 


26.5.2 Orthonormal Basis 


Whenever there is an inner product on a vector space, there is the possibility 
of orthogonal basis vectors. Since, in general, g(v, v) is allowed to be neg- 
ative or zero, we have to redefine what we mean by a vector of norm 1. If 
g(v, v) £0, we define the norm of v as ||v|| = |,/g(v, v)|. A unit vector, or 
a vector of norm | obtained from v is simply v/||v||. 


Theorem 26.5.18 An inner product space V has an orthonormal basis. 


polarization identity Proof Start with the polarization identity, 


1 
g(v,v) = gla(v+v.v+v) —g(v—v,v—v)], 
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and use it to convince yourself that g is identically zero unless there exists 
a vector v such that g(v, v) 4 0. Let e; = v/||v||, and note that g(e;,e)) = 
n| = +1. Now suppose that we have found a set {e;}/”_, of m orthonormal 
vectors in V. We show that as long as m < dimV, we can add one more 
unit vector to the set. Let W be the subspace spanned by {e;};"_;. Then W is 
nondegenerate, and by Corollary 26.5.15, W+ is also nondegenerate. Hence, 
there exist a vector u € W+ with g(u, u) 4 0, and e+; =u/||ul| is a unit 
vector orthogonal to all vectors {e;}'"_ |. 


Definition 26.5.19 Let B = {e;}/_, be a basis of V and jj; = g(e;,e;). 
We say B is g-orthonormal if 7;; = 0 for i 4 j, and nj; = £1. The nj; 
are called the diagonal components of g. We use n+ and n_ to denote 
the number of vectors e; for which nj; is, respectively, +1 and —1. The 
collection (711, 22,---, 7NN) is called the signature of g. 


Example 26.5.20 Let V = R? and vy; = (x1, y1,Z1), V2 = (2, y2, 22), and 
v= (x, y, z). Define the symmetric bilinear form 


1 
g(v1, v2) = ye + x2y1 + y1z2 + yoz1 +4122 + 221) 


so that g(v, v) = xy + yz +.xz. We wish to find a set of vectors in R? that 
are orthonormal with respect to this bilinear form. Clearly, e; = (1, 1, 0) is 
such that g(e;,e;) = 1. So e; is one of our vectors. Consider v = (1, 0, 1) 
and note that the vector ro = v — [g(v, e1)/g(e1, e1)]e1, suggested by the 
Gram-—Schmidt process, is orthogonal to e;. Furthermore, it is easily verified 
that g(r2, r2) = — 3. Therefore, our second vector is 


e rr ( 1 3 2 ) 
2, — = ’ ’ 

Vv |g(r2, F2)| V5 V5 V5 
with g(e2, e2) = —1. Finally, we take w = (0, 1, 1). Then 


g(w, e1) g(w, e2) 4 
nB=Ww e eo = 3,1,1 
oY Ge)” see) 


will be orthogonal to both e; and e2 with g(r3, r3) = —3. Thus, the third 
vector can be chosen to be 


rm (3 1 1 


° Vigas.r)) ( V3 V5" za) 


and we obtain g(e;, e1) = 1, g(e2, e2) = —1, g(e3, e3) = —1, g(e;,e;) =0 
for i ~ j. We also have ny = 1, n_ = 2. Although we have worked in a 
particular basis, Theorem 26.5.21 below guarantees that n+ and n_ are (or- 
thonormal) basis-independent. 


The matrix of g in an orthonormal basis is the diagonal matrix of n;;. The 
elements of the inverse of this matrix (which is equal to the matrix itself) 
are denoted by 7'/. The seemingly unnecessary use of superscripts for the 
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Sylvester's theorem 


inverse is not only consistent with the discussion leading to Eq. (26.31), but 
also with index manipulations of tensors. For example, when superscripts 
are used for the inverse, we have 7; ; nik — 5k , with indices properly located. 

Let {e;} a , be an orthonormal basis of V and v = vie; an arbitrary vector 
in V. Now take the inner product of v with e; to obtain 


g(v, e;) rt g(v'e;, e;) = v! Nij- 
Multiply both sides by n/ * (with sum over repeated indices understood): 


ni¥gcv, ej) =v! nijn!* = viok = 0". 


This leads to the orthogonal expansion of an arbitrary vector v: 


N 


N 
v=ni*givejex= > n* giv, exer => mxg(v exer. (26.38) 
k=1 k=1 


If W is a nondegenerate subspace of an inner product space V, and if we 
enlarge an orthonormal basis {ej} , of W to an orthonormal basis of V, 
then the operator Pw projecting onto W is defined as 


Pwiv)= >> n/*giv, eer => nav, exer = >> mkG(v, exer. 


jk=l k=1 k=1 
(26.39) 
Clearly, Pw(v) =v if v € W and Pw(v) =O0ifve Wt. 
Theorem 26.5.21 The number n_ of negative signs in (N11, 22,---,NN)> 


the signature of any orthonormal basis {e; adie of an inner product space V, 
is equal to v, the index of V. 


Proof Assume that the first n_ signs are negative. If n_ = 0 orn_=N, 
the proof is trivial. Let U be the span of {ei} It is obvious that gly is 
negative definite. By Definition 26.5.8, v > n_. 

Now let W be an arbitrary subspace of V on which g is negative definite, 
and define the linear map z : W > U by 


(Ww) = >> neeg(w, exer = — ) gw, exer. 


k=1 k=1 


We claim that z is injective. To prove our claim, we show kerz = 0. If 
(w) = 0, then by Eq. (26.38), w= Yea g(w, ex)ex, and 


N N 
aww. =9( Yo giweever, >> awv.epei) 


k=n_+1 j=n_t+l1 


N N 
= > SY gw, e)g(w.e;) g(ex.e;) 
k=n_+1 j=n_+1 ee a 
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N N 


= © [owe] me= S> [gcw.e)]’. 


k=n_+1 k=n_+1 


The left-hand side is negative (unless w = 0 in which case it is zero), the 
right-hand side is positive (or zero). The only way that the equality can hold 
is for w to be the zero vector. Hence, kerma = 0, and z is injective. This 
implies that dim'W <n_. In particular, if W has maximal dimension v, we 
have v < n_. This, along with the conclusion of the first paragraph of the 
proof, yields vy =n_. 


Corollary 26.5.22 Let W~ denote the largest subspace of the inner product 
space V on which g is negative definite. Then V = W~ @ W*, where W* is 
the orthogonal complement of W~ and g is positive definite on WT . 


Take the Euclidean n-space R” and for some integer 0 < v <n, change 
the signs of the first v terms in the usual inner product of R”: 


n v n 
(u,v) =g(u, v) =nijuie! = So niule' =— Youll + YO u'v!. 
(26.40) 
The resulting inner product space, denoted by R", is called the semi- Semi-Euclidean and 
Euclidean space. For n > 2, R’ is called the Minkowski n-space. Rj is Minkowski spaces 
the space of the special theory of relativity. 


Proposition 26.5.23 Let R\, and Rij be semi-Euclidean spaces. Then 


n mm n+rm 
ROR™=RIm 


Proof Apply Eq. (2.12). 


For R", substitute the vectors of an orthonormal basis {e; ee , for both v; 
and u; in Eq. (26.32) to get 


Ao(ei,...,ev)” =a det(g(e;,e;)) = a det(nij) =a(—-1)”. 


This shows that w(—1)” > 0. Hence, we can define a new determinant func- 
tion by 
nr ea (26.41) 
~ ae : 


for which (26.32) takes the form 
A(v1,.--,Vw)A(Wy,...,uy) = (-1)” det(g(vi, u;)). (26.42) 


A determinant function satisfying this equation is called a normed deter- normed determinant 
minant function. Equation (26.41) shows that there are exactly two normed function 
determinant functions A and —A in V. 
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Orthonormal bases allow us to speak of the oriented volume element. 
Suppose {e/ Yer is an oriented orthonormal basis of V*. If {pk Al , is an- 
other orthonormal basis in the same orientation and related to {€/} by a 
matrix R, then 

gy! Ag? A+. AQgN = (detRye! Ae7 A--- AN. 
Since {pk } and {e/} are orthonormal, the determinant of g, which is det(7;;), 
is (—1)” in both of them. Problem 26.26 then implies that (det R)? =lor 
det R = +1. However, {p*} and {e/} belong to the same orientation. Thus, 
detR = +1, and {p*} and {e/} give the same volume element. 


Definition 26.5.24 The volume element of an inner product space (V, g) 
relative to g is a volume element obtained from any orthonormal basis 
of V*. 


We should emphasize that the invariance of v is true for g-orthonormal 
bases. As a counterexample, consider g of Example 26.5.20 applied to the 
standard basis of R?, which we designate with a prime. It is readily verified 
that 


g(e},e;)=0 fori =1,2,3. 


So it might appear that v = 0 for this basis. However, the standard basis is 
not g-orthonormal. In fact, 


y 1 / / ¥ / 
g(e.e5) = 3 g(e;,e3) =9(e, e5). 


That is why the nonstandard vectors e;, v, and w were chosen in Exam- 
ple 26.5.20. 


Example 26.5.25 Let fe} , be a basis of V and {e' Jee , its dual. We can 
define the permutation tensor 


iji2...in it in IN (gp. : f 
a Gye KE Khe (€j,,€j.---,€jy)- (26.43) 


It is clear from this definition that ay a is completely skew-symmetric in 


all upper indices. That it is also skew-symmetric in the lower indices can be 
seen as follows. Assume that two of the lower indices are equal. This means 
having two e;’s equal in (26.43). These two e;’s will contract with two el’s 
say e* and e!. Thus, in the expansion there will be a term Ce*(e;)e!(e;), 
where C is the product of all the other factors. Since the product is com- 
pletely skew-symmetric in the upper indices, there must also exist another 
term, with a minus sign and in which the upper indices k and / are inter- 
changed: —Ce! (e; )e*(e;). This makes the sum zero, and by Theorem 2.6.3, 
(26.43) is antisymmetric in the lower indices as well. 

This suggests that es IN og gitianin € jy jo...jy- 10 find the proportional- 
ity constant, we note that (see Problem 26.14) 


12...N 1 2 N 
BIN = >, x(n )...nwy8nay8e 2) + SN): 


oS 
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The only contribution to the sum comes from the permutation with the prop- 
erty 2(i) =i. This is the identity permutation for which €, = 1. Thus, we 
have 5/3") = 1. On the other hand, by Problem 26.25, 


12. N Gia. Se" =(- 1)" =: 


Therefore, the proportionality constant is (—1)”-. Thus 


ce,» dy (26.44) 


We can find an explicit expression for the permutation tensor of Exam- 
ple 26.5.25. Expanding the RHS of Eq. (26.43) using Eq. (26.18), we obtain 


gilt -IN in 
8 fad =) me(in)on a) Sein) 
(26.45) 
cilia. ING. =(—- 1)" Liens, 52 7 «gin 
Aij2--Jn = at x(t) Orin)’ m(jn)* 


Furthermore, the first equation of (26.45) can be written concisely as a de- 
terminant, because 
iji2...in __ iy iD) in 
bn Nn = YS ext)... (nS e820) ve OTN)" 
T 

The RHS is clearly the determinant of a matrix (expanded with respect to 
the ith row) whose elements are 5 . The same holds true if 1,2,..., N is 
replaced by ji, j2,..., jn; thus, 


iy ij 1 
55, 3p _ Sin 
62 giz. gi 
ee a = — det i h un : (26.46) 
in in : 
55, 5j, ac Sin 


Example 26.5.26 Let us apply the second equation of (26.45) to Eu- 
clidean R?: 
Fk tun = 54 jy d* — 815, 8, — 51,5) 5k + 81,8) 3F + 51,87 8& — si shat. 


m 1 “m 


From this fundamental relation, we can obtain other useful formulas. For 
example, setting n = k and summing over k, we get 


eT img = 3515}, — 454, — 35!,5) + 51,6) + 31,81 — 815), = 18), — 41,81. 
Now set m = j in this equation and sum over /: 
eRe, = 35) — 5) = 25). 
Finally, let / =i and sum over 7: 
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awa =N!, or ENING, iy =(-1)"-M1, 
hincinaly = ON — DIB. oF 
fla INN Gitn, tga = (DWN — DBR, 
and 
hincincadncady = ON — 28018 iy — Sig Sina) 
=(N— 2)18,0- ee or 
greet aN dy bin aan 
= (-1)""(N — 2)!(5 Be ea 5x oe se 
More generally, 
Niadplpntedy =P patty ant) 
Equation (26.47) can be generalized even further: 
iret vlttpittptmin (PFO! sieept in (26.48) 
Tkp le lk+pJkt+p+ 1+ JN k! Jk+pt JN * . 


If you set the j’s equal to i’s in Eq. (26.47), you get N! on the left-hand 
side. Equation (26.48) then yields N!/p! on the right-hand side, making the 
two sides equal. 

Another useful property of the permutation tensor is (Problem 26.15) 


Bt Aivin. in =), €e Anjum a). Gr) (26.49) 


of 


for any tensor (the tensor could have more indices, and some of the r indices 
could be mixed). In particular if A is antisymmetric in i,i2...i,, then 


gitia- ey Aijin...i, = TIA jy jo..cine (26.50) 


Ji J2-+-dr 


Example 26.5.27 As an application of the foregoing formalism, we can 
express the determinant of a 2 x 2 matrix in terms of traces. Let A be such a 


matrix with elements Ai. Then 
| 4 rn | 
det = «ij, Ay = 5 (€ij A) 49 — 454441) = 5 (sie AA?) 
149 Deeside. tad 
= 5 AeAy (8'3; — 858) = 5 (4,45 — 4543) 
= 5[ttrayctra) — (A°)/] = S[tra)? — e(A2)], 
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We can generalize the result of the example above and express the deter- 
minant of an N x N matrix as 


1. bee 3 Sa ote . 
— Tji2.-IN-. Ji qJ2... AJIN 
detA = wis Ej in Ai, Ain Airy 


1 i ig Ji yJ2, Jn 
= yy De roe eece) Bry Ann Aig Aig (2001) 
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26.5.3 Inner Product on A”(V, U) 


If (V,g) is an inner product space, then g induces an inner product g on 
A?(V) as follows. Extend the mapping g,! : V* — V to the space of p- 
forms by applying it to each factor (e.g., in the expansion of the p-form ina 
basis of A?(V)). This extension makes g,! a map g,! > AP(V) > AP(V*) 
which takes a p-form B and turns it into a p-vector g, '(B). Then the pairing 
(a,g. '(B)), with a, B € A?P(YV), is the desired induced inner product. More 
specifically, let {e ne be a basis of V and {e’ ve , its dual basis. Then, 


- fl : : ; 
9. '(B) = 4g, : Gees IN 7) 


-l/_i 
= Ph ip (Ge ‘fe! Ag. a BY A AQ, (e'”)) 
iyj ij ij 
= Bh (g Mg A g ei, Weed, g rJpe@;,) 
= I it gi2j2 Sir ae 
_ a &§ da <3 Bizin..ip®j, NC jg No NCj, 


! Ji j2--dp 
=a €j, N€j, A+ Aej,. 


Note how the indices of the components of B have been raised by the com- 
ponents of the g~!. Pairing this last expression with a, we get 


g(a, B) = (a, g,'(B)) 


1 ed : ‘ 
aes L L 
= atin Bh Blnfel nao nelPyey Ave Ae 


2, Bal i-dpeil inn... \elP(e:_e; ; 
(pyre sine inB Pel AEP A AE? (€;,,€j,---,€j,) 


_ 522 lp 2. BAI J2Sp 
= oe tt (in) Om ja)” One jp) Viri2-ip B , 


where in the last step we used Eq. (26.18). Therefore, 


g(a, B) = —; 


1 ee 
2 A Rd Bil iJ 
€x Un (j1)m(j2)...1 ip) B a = Ftd jpP ’, 


(26.52) 


hg 
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Hodge star operator 


because Ose ( jy) (jn)... jp) = Ex jy jo... jp due to the antisymmetry of the com- 
ponents of p-forms. 

Having found g, we can extend it further to A?(V, U) if U has an inner 
product h. Let a be a basis of U, and note that any a € A’(V, U) 
can be written as a = se a“f,, where a? € A?(V). Denote the inner 


mS 


product of A?(V, U) as gh and, for 


dim U dim U 


a= > a“ fa, B = bs B’Ey, 
a=1 b=1 


define it as 
= dim U dim U 
gh@,B)= >> g(a", B’)h(fa.f) = S> havG(a*.B”). (26.53) 
a,b=1 a,b=1 


It is routine to show that gh is basis-independent. 


26.6 The Hodge Star Operator 


It was established in Chap. 4 that all vector spaces of the same dimension are 
isomorphic. Therefore, the two vector spaces A?(V) and AN~?(V) having 
the same dimension, () = ( fos als must be isomorphic. In fact, there is a 


natural isomorphism between the two spaces: 


Definition 26.6.1 Let g be an inner product and {e! We 1 2 & 


orthonormal ordered basis of V*. The Hodge star operator is a lin- 
ear mapping, * : A?(V) > A~P(¥V), given by (remember Einstein’s 
summation convention!) 


1 iin.tp 


pee J soo at 
Va pitied A Kel, (26,54) 


(el A. Ae?) = 


A similar star operator can be defined on p-vectors. Chen iN is obtained 
from €j,...jy by raising its first p subscripts. 

Although this definition is based on a choice of basis, it can be shown that 
the operator is basis-independent. In fact, if@ = ei ici »€ 1A.--Ae!?, then 


Eq. (26.54) gives 


1 


oO = Nope She Awe A e/N. (26.55) 


26.6 The Hodge Star Operator 


Now let {vi}ly be any other basis of V positively oriented relative to 
fe}. Let (0) , be dual to {v; We ,. Write v; = Rie; and (therefore) 
6! = ( Ro! )ie/ : Stites. denoting g '@) by @, we have 


Jie Jp = Jp Jigit Ip ip) — Jl Jp - oils dp 
wo @(e",...€/7) =@(Rj'0",...R;?8"”) = Ril. 


ed 


where @'!-!” are the components of w in the general basis {v nape Substi- 


tuting the last expression above and €/’s in terms of 6‘’s in Eq. (26.55), we 
get 


1 : eo, iN gi 
J JP ~ij..ip Jp+t H 
kD = aN a pin Ri o'! " € jy jo...5N (R; ae g'r+) Avi (Ria'") 
1 _— . 
a wee J p 
~ Anan ? (Ej p.. ath »R; myer A a ae 
i : iene A eS 
=€i,i9...iy detR 


Using the result det R = idet GI" = |G|'/? (Problem 26.26), where G de- 


notes the matrix of g in {v nd jai We finally obtain 


1 i, ; 
Gaga pie Mitten OPH Ao 0" 


=16'? Sale PEisinniyO?! @-- OO, (26.56) 


where the last equality follows because 0/+! @ --- @ 6!" does not have a 
symmetry. Note that this last expression reduces to (26.55), because |G| = 1 
in an orthonormal basis, and *w as given by Eq. (26.56) is indeed basis- 
independent. 


Example 26.6.2 Let us apply Definition 26.6.1 to A?(R**) for p =0,1, 
2,3. Let {e;, e2, e3} be an oriented orthonormal basis of R?. 


(a) For A°(R**) = Ra basis is 1, and (26.54) gives 


er 
«l= Fh ej Ae; Ae, =e1 Ae2 Ae. 
(b) For A!(R3*) = R? a basis is {e],e2,e3}, and (26.54) gives xe; = 
te ej; A ex, or xe] = €2 A €3, x2 = €3 A], Ke3 = E] AED. 
(c) For A?(R**) a basis is {e, A e2,e] A e3, €2 A e3}, and (26.54) gives 
*e; Ae; = ji ex, or *(e€, Ae2) = €3, *(€] Ae€3) = —@p, *(€2 A€@3) =e]. 
(d) For A7( Re) a basis is {e; A e2 A e3}, and (26.54) yields *(e; A e2 A 
e3) =€1233 =1. 


The preceding example may suggest that applying the Hodge star opera- 
tor twice (composition of « with itself, or *« 0 *) is equivalent to applying the 
identity operator. This is partially true. The following theorem is a precise 
statement of this conjecture. (For a proof, see [Bish 80, p. 111].) 
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Theorem 26.6.3 Let V be an oriented space with an inner product g. 
For Ae A?(V), we have 


«0 *A = * * A= (—1)"(—1)?O-PA, (26.57) 


where v is the index of g and N = dim V. 


In particular, for Euclidean spaces with an odd number of dimensions 
(such as R3), «x A=A. 

One can extend the star operation to any A € A?(V) by writing A as a 
linear combination of basis vectors of A? (V) constructed out of {e; Re ,» and 
using the linearity of x. 

The star operator creates an (N — p)-form out of a p-form. If we take 
the exterior product of a p-form and the star of another p-form, we get an 
N-form, which is proportional to a volume element. In fact, one can prove 
(Problem 26.33) 


Theorem 26.6.4 Let (V,g) be an inner product space and yu a vol- 
ume element relative to g. Let g be the inner product induced by g on 
AP(\V) and given explicitly in Eq. (26.52). Then fora, B € A? (V), we 
have a \ *B = g(a, B)L. 


In the discussion of exterior algebra one encounters sums of the form 


a 
A PVi, Ast AViy- 


It is important to note that A!” is assumed skew-symmetric. For example, 
if A= e; A eo, then in the sum A = A‘/e; A e;, the nonzero components 
consist of Al? = 5 and A?! = —3. Similarly, when B = e; A e2 A e3 is 


written in the form B = B’/*e; A e; A ex, it is understood that the nonzero 
components of B are not restricted to B!?>. Other components, such as B!?, 
B?3!, and so on, are also nonzero. In fact, we have 


B23 — — p32 — _ R213 — p23! — p32 — _ p321 _ 1 
6 
This should be kept in mind when sums over exterior products with numer- 
ical coefficients are encountered. 


Example 26.6.5 Let a,b € R? and {e;,e,e3} an oriented orthonormal 
basis of R?. Then a = a’e; and b = bie;. Let us calculate a A b and 
*(a A b). We assume a Euclidean g on R3. Then a A b= (a'e;) A (b/e;) — 
able; A e;, and 


«(a Ab) = *(a'e;) A (b/e;) =a'b! *(e; Ae;) =a'b! (€j;ex) = (cj,a'b! )ex. 


We see that *(a A b) is a vector with components [*(a A b)|k = efa'bi, 
which are precisely the components of a x b. , 


26.7. Problems 823 


The correspondence between a A b and a x b holds only in three dimen- 
sions, because dim A!(V) = dim A?(V) only if dim V = 3. 


Example 26.6.6 We can use the results of Examples 26.5.26 and 26.6.5 to 
establish a sample of familiar vector identities componentwise. 


(a) For the triple cross product, we have 


k ki k i a] I kij 
[a x (b x ©) |" = fa‘ (b xe)! = fa" (Ej, b'c") = aibi ee € jum 
= ajb!c™e") em; = ajb' c™ (8F 51, — 5% 4) 
=ajbkc! — ajbick = (a- bk — (a- b)c*, 


which is the kth component of b(a-c) — c(a- b). In deriving the above 
“bac cab” rule, we used the fact that one can swap an upper index with 
the same lower index: a‘b; = a,b’. 

(b) Next we show the familiar statement that the divergence of curl is zero. 
Let 0; denote differentiation with respect to x;. Then 


V-(V x a) =9;(V x a)! = dei, 0/ ak = "9,9 jag 
= —€F9;8 ay = —€F"* 8; d;a% = —9; (€/* Ajax) 
=-d;(Vxa/=-V-(Vxa) > V-(Vxa= 
(c) Finally, we show that curl of gradient is zero: 
[Vx Wf)] =ci,a/ a" f = cia, f =0, 


ijk ; 


because ¢'/™ is antisymmetric in jk, while 0; 0, f is symmetric in jk. 


Example above shows in general that 


Box 26.6.7 When the product of two tensors is summed over a pair 
of indices in which one of the tensors is symmetric and the other anti- 
symmetric, the result is zero. 


26.7 Problems 
26.1 Show that the mapping v: V* > R given by v(t) = T(v) is linear. 


26.2 Show that the components of a tensor product are the products of the 
components of the factors: 


i A L ai qT! A 
(U@T);" =U. r r+l---lr+k 


Jet wds Lie sJs+t 
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26.3 Show that e;, @---@ ej, @ e'| @--- @e'» are linearly independent. 
Hint: Consider Ay re @+--@e;, @ el! @--- @e's =0 and evaluate the 
LHS on appropriate tensors to show that all coefficients are zero. 


26.4 What is the tensor product of A = 2e, — ey + 3e, with itself? 
26.5 If A € L(V) is represented by Ai in the basis {e;} and by ee in {e}, 
then show that 
Ake, @e" = Ale; @E/, 
where {e/} and {e”’} are dual to {e;} and {e,}, respectively. 


26.6 Prove that the linear functional F : V — R is a linear invariant, i.e., 
basis-independent, function. 


26.7 Show that tr: ay — R is an invariant linear function. 
26.8 If A is skew-symmetric in some pair of variables, show that S(A) = 0. 


26.9 Using the exterior product show whether the following three vectors 
are linearly dependent or independent: 


Vi = 2e; — e2 + 3e3 — e@4, 
V2 = —e; + 3e2 — 2e4, 


v3 = 3e; + 2e2 — 4e3 + eg. 
26.10 Show that {e, A e;} with k <i are linearly independent. 


26.11 Let v € V be nonzero, and let A € A?(V*). Show that vA A=0 if 
and only if there exists B€ A?~!(V*) such that A= vB. Hint: Let v be 
the first vector of a basis; separate out v in the expansion of A in terms of 
the p-fold wedge products of basis vectors, and multiply the result by v. 


26.12 Let A € A?(V) with components A‘/. Show that A A A = 0 if and 
only if AY AK — AM Ai! 4 Al Ask — 0 for all i, j,k, 1 in any basis. 


26.13 Let {e), e2, 3} be any basis in R*. Define an operator E : R* > R? 
that permutes any set of three vectors {v1, V2, v3} to {vj, Vj, Vx}. Find the 
matrix representation of this operator and show that detE = €;jx. 


26.14 Starting with the definition of the permutation tensor 5 pe and 
writing the wedge product in terms of the antisymmetrized tensor product, 
show that 
ii2.in ae sit opin gin 
Oi iad = Do exci in).- in) Bet jy Ox) 8 Cin)" 


4 


26.15 Derive Eqs. (26.49) and (26.50). 


26.7 Problems 


26.16 Show that a 2-form @ is nondegenerate if and only if the determinant 
of (w;;) is nonzero if and only if w? is an isomorphism. 


26.17 Let V be a finite-dimensional vector space and w € A?(V). Suppose 
there exist a pair of vectors e;, e| € V such that w(e;, e|) 4 0. Let P; be the 
plane spanned by e; and e}, and V, the w-orthogonal complement of 1. 
Show that Vj 1 P = 0, and that v — w(v, ee; + @(v, e; Je} is in Vj. 


26.18 Show that )\%_, €/ Ae/*", in which {ef}, is dual to {e;}"”_,, the 
canonical basis of a symplectic vector space V, has the same matrix as w. 


26.19 Suppose that V is a symplectic vector space and v, wv’ € V are ex- 
pressed in a canonical basis of V with coefficients {x;, yj, zi} and {x/, y;, z;}. 
Show that 


n 


w(v,v) = > (xy — xiyi). 


i=l 
26.20 Let V be a vector space and V* its dual. Define w € A?(V ®@ V*) by 
wo(v+9,v +9’) =9'(v) —¢e(v’) 


where v, v’ € V andg, g’ € V*. Show that (V @ V*, w) is a symplectic vector 
space. 


26.21 By taking successive powers of w show that 
n 
w= Yo el nehtr yn... nek neki, 
Ji Jk=l 


ik 


Conclude that 


ow =nl(—-1 le! an... Ae", 


where [1/2] is the largest integer less than or equal to n/2. 


26.22 Show that the condition for a matrix A to be symplectic is A‘’JA = J 
where J = ( beh Al is the representation of @ in the canonical basis. 


26.23 Show that Sp(V,@) is a subgroup of GL(Y). 
26.24 Show that the linear operator gy defined in Lemma 26.5.13 is onto. 


26.25 (a) Show that the inverse of the (diagonal) matrix of g in an orthonor- 
mal basis is the same as the matrix of g. 
(b) Now show that €!?-" = (—1)"ey., y =(—L)”. 


26.26 Let {e;}_ , be a g-orthonormal basis of V. Let n be the matrix with 
elements 77;;, which is the matrix of g in this orthonormal basis. Let {v; 3 
be another (not necessarily orthonormal) basis of V with a transformation 


matrix R, i.e., Vj = re ej. 
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(a) Using G to denote the matrix of g in Waa show that 


det G = detn(det R)? = (—1)” (det R)?. 


In particular, the sign of this determinant is invariant. Why is detG 
not equal to detn? Is there any conflict with the statement that the 
determinant is basis-independent? 

(b) Let yw be the volume element related to g, and let |G| = |det G|. Show 
that if {v;} wy is positively oriented relative to yw, then 


w=|G)?v, Avo A+ Aw. 


26.27 Let b be a symmetric bilinear form. Show that the kernel of b,. :, V > 
V* consists of all vectors u € V such that b(u, v) = 0 for all v € V. Show 
also that in the b-orthonormal basis {e;}, the set {e; | b(e;, e;) = 0} is a basis 
of kerb, and therefore the set of linearly independent isotropic vectors is the 
nullity of b. 


26.28 For this problem, we return to the Dirac bra and ket notation. Let T be 
an isometry in the real vector space V. Then |y) = (T — 1)|x) is the vector, 
which, in three-dimensions, connects the tip of |x) to its isometric image. 


(a) Show that (y|y) = 2(x|(1 —T)|x). 
(b) Show that 


7 (ely 
Py =T-VGa—-pat — 9) 


and 


|x) (x| (T! 1). 
(x|(1 — T)|x) 
(c) Verify that Ry|x) = T|x), as we expect. 


R, =1-(T-1) 


26.29 Use Eq. (26.51) to show that for a 3 x 3 matrix A, 


detA = [(tra)° —3 trA tr(A’) + 2tr(A*)]. 


1 

3! 
26.30 Find the index and the signature for the bilinear form g on R? given 
by g(v1, V2) = x1y2 +x2y1 — Y1zZ2 — y2z1. 


26.31 In relativistic electromagnetic theory the current J and the electro- 
magnetic field tensor F are, respectively, a four-vector® and an antisymmet- 
ric tensor of rank 2. That is, J = J*e, and F = F'/e; A e;. Find the com- 
ponents of *J and *F. Recall that the space of relativity is a 4D Minkowski 
space. 


81t turns out to be more natural to consider J as a 3-form. However, such a fine distinction 
is not of any consequence for the present discussion. 


26.7 Problems 


26.32 Show that €j, j,...jy Rj... RN = Git UELR. 


ia 
26.33 Prove Theorem 26.6.4. 


26.34 Show that where there is a sum over an upper index and a lower 
index, swapping the upper index to a lower index, and vice versa, does not 
change the sum. In other words, A’ Bj = A;B'. 


26.35 Show the following vector identities, using the definition of cross 
products in terms of €;;x. 


(a) AxA=0. 

(bv) V-(AxB)=(V x A)-B-(V xB)-A. 

(c) Vx (Ax B)=(B-V)A+A(V -B)— (A- V)B— B(V- A). 
(d) Vx(VxA)=V(V-A)—V2A. 


26.36 A vector operator V is defined as a set of three operators, {v! Vv, V7}, 


satisfying the following commutation relations with angular momentum: 
[(V!, J/] = ie/kW,. Show that V'V; commutes with all components of an- 
gular momentum. 


26.37 The Pauli spin matrices 


i f0° 1 2_ (0 -i 3_[1 0 
a'=(1 4): TE of? 9 “No -1 


describe a particle with spin 5 in nonrelativistic quantum mechanics. Verify 
that these matrices satisfy 


lar, o!| =o'ol—oalgi= te? a", {o',o/} =oloitoalogi= 25412, 


where 1 is the unit 2 x 2 matrix. Show also that aol= ig! o® + 5 12, and 
for any two vectors a and b, (o -a)(o -b) =a-b12+1i0-(axb). 


26.38 Show that any contravariant tensor of rank two can be written as the 
sum of a symmetric tensor and an antisymmetric tensor. Can this be gener- 
alized to tensors of arbitrary rank? 
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The last chapter introduced the exterior product, which multiplied a p- 
vector and a q-vector to yield a (p + qg)-vector. By directly summing the 
spaces of all such vectors, we obtained a vector space which was closed un- 
der multiplication. This led to a 2”-dimensional algebra, which we called 
the exterior algebra (see Theorem 26.3.6). 

In the meantime we revisited inner product and considered non-Euclidean 
inner products, which are of physical significance. In this chapter, we shall 
combine the exterior product with the inner product to create a new type of 
algebra, the Clifford algebra, which happens to have important applications 
in physics. 

In our definition of exterior product in the previous chapter, we assumed 
that the number of vectors was equal to the number of linear functionals 
taken from the dual space [see Eq. (26.14)]. As a result of this complete 
pairing, we always ended up with a number. It is useful, however, to define 
an “incomplete” pairing in which the number of vectors and dual vectors 
are not the same. In particular, if we have a p-vector and a single 1-form, 
then we can pair the 1-form with one of the factors of the p-vector to get a 
(p — 1)-vector. This process is important enough to warrant the following: 


Definition 27.0.1 Let A be a p-vector and @ a 1-form in a vector space V. 
Then define ig : A? (V*) > A?—!(V*) by 


igA(O,... 9 p-1) =A(0,0,... 9 p-1). 
igA is called the interior product or contraction of 6 and A. 


Note that if A is a 1-vector v, then igv = (0, v), and if it is a real num- 
ber a, then (by definition) iga = 0. 
An immediate consequence of Definition 27.0.1 is the following: 


Theorem 27.0.2 Let A be a p-vector and B be a q-vector on a vector 
space V. Then, ig is an antiderivation with respect to the wedge product: 


ig(A A B) = (igA) AB + (—1)?A A (iB). 
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If (V,g) is an inner product space, we can define the interior product 
of a 1-vector v and a p-vector A. In fact, if g, : V > V* is as defined in 
Eq. (26.30), then let 


iyA Sig, A. (27.1) 


In particular, if A is a vector u, then iyu = g(u, v). 


27.1 Construction of Clifford Algebras 


Let V be a real vector space with inner product g. Let ve V and Ae 
AP(V*). Define the product Vv : V x AP(V*) > APt+!(V*) @ AP-!(V*) by 


vVVA=VAA+iyA (272) 


where iyA is as defined in Eq. (27.1). This product is called the Clifford 
product. 
The special case of p = 1 is of importance. For such a case, we obtain 


vVVu=VAu+iu=VAuU+ g(U, Vv) (27.3) 
which can also be written as 
vVvu+uV v= 2g(u, Vv). (27.4) 


This equation is sometimes taken as the definition of the Clifford product 
and the starting point of the Clifford algebra (to be discussed below). 

We see that the Clifford product has been defined on the vector space 
which underlies the exterior algebra. However, the left factor in the Clifford 
product is just a vector, not a general member of the exterior algebra. Is it 
possible to define Clifford product of a g-vector and a p-vector? It is indeed 
possible if we assume that V is associative and distributes over addition. To 
show this, pick a basis and write a g-vector in terms of that basis. Thus, let 
A be as before and let B € A?(V*), and write 


La jad 
B= a 1e@j, A+++ ACj,. 
Then 
— pii--lae. : 
qiBVA=b'7e;,A---Ae;, VA 

— pil--Jae. . ; . _ 
= DEI, A+ NC jy A (Cjg_1 V Cig + Sig ig) VA 
= bl I9e;, A+++ AC jg A (Cjg_1 Vj, V A) (27.5) 


because gj, ;,_,; 18 symmetric under the exchange of jg and jg—; while 


b/\--Jq is antisymmetric. To continue, we use Eq. (27.2) and rewrite the term 
in the parentheses on the last line of Eq. (27.5): 


27.1. Construction of Clifford Algebras 
€j,-1 V Sig VA=e;,_, Vv (ej, AA + ie; A) 
=ej,_,V (ej, AA)+ ej, V (le;, A) 
=ej,_, (ej, \A) + Fei (ej, AA) 


+ Cjg-1 A (le, A) =F tej, , Ce), A) 


— Cjg-1 N\ Cj, AA + 8 ig ig iA — ej, AN (ie; _, A) 


+ €j,1 A (lei, A) + ie, , (ej, A), 


where in the last equality, we used the antiderivation property of the interior 
product (Theorem 27.0.2). Substituting the last equation in (27.5) yields 


q!BV A=q!BAA+ ble), A+ A@jy_y AC jg A (ici, AD 
— bi dae; Aw A ej,» A€j, A (ie; _, A) 
+ bl He No KC jpg A [ie (ie;, A)]. (27.6) 


The right-hand side is given entirely in terms of wedge products, which 
are known operations. Hence, the Clifford product of any p-vector and q- 
vector can be defined, and this product is in A(V*) of Theorem 26.3.6. Thus, 
A(V*) is an algebra not only under the wedge product but also under the 
Clifford product. With the latter as the multiplication rule, A(V*) is called 
a Clifford algebra and denoted by Cy. 


Historical Notes 

At the age of 15 William Clifford went to King’s College, London where he excelled 
in mathematics, classics, English literature, and gymnastics. Three years later, he entered 
Trinity College, Cambridge, where he won not only prizes for mathematics but also one 
for a speech he delivered on Sir Walter Raleigh. In 1868, he was elected to a Fellowship at 
Trinity, and three years later, he was appointed to the chair of Mathematics and Mechanics 
at University College London. In 1874 he was elected a Fellow of the Royal Society. He 
was also an active member of the London Mathematical Society which held its meetings 
at University College. 

Clifford read the work of Riemann and Lobachevsky on non-euclidean geometry, and 
became interested in the subject. Almost 50 years before the advent of Einstein’s general 
theory of relativity, he wrote On the space theory of matter in which he argued that energy 
and matter are different aspects of the curvature of space. 

Clifford generalised the quaternions (introduced by Hamilton two years before Clifford’s 
birth) to what he called the biquaternions and he used them to study motion in non- 
euclidean spaces and on certain surfaces. 

As a teacher, Clifford’s reputation was outstanding and famous for his clarity of expla- 
nation of difficult mathematical problems. Not only was he a highly original teacher and 
researcher, he was also a philosopher of science. At the age of 23 he delivered a lecture 
to the Royal Institution entitled Some of the conditions of mental development, in which 
he tried to explain how scientific discovery comes about. 

He was eccentric in appearance, habits and opinions. A fellow undergraduate describes 
him as follows: “His neatness and dexterity were unusually great, but the most remarkable 
thing was his great strength as compared with his weight. At one time he would pull up 
on the bar with either hand.” 

Like another British mathematician, Charles Dodgson, he took pleasure in entertaining 
children. Although he never achieved Dodgson’s success in writing such books as Alice’s 
Adventures in Wonderland (which the latter wrote under the pseudonym Lewis Carroll), 
Clifford wrote The Little People, a collection of fairy stories written to amuse children. 
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In 1876 Clifford suffered a physical collapse, which was made worse by overwork, and 
most likely, caused by it. He would spend the entire day teaching and doing adminis- 
trative work, and the entire night doing research. Spending six months in Algeria and 
Spain allowed him to recover sufficiently to resume his work. But after 18 months he col- 
lapsed again, after which he spent some time in Mediterranean countries, but this was not 
enough to improve his health. After a couple of months in England in late 1878, he left 
for Madeira. The hoped-for recovery never materialized and he died a few months later. 


We have shown that the Clifford product of a p-vector and a q-vector 
lies in A(V*). This implies that the underlying vector space of the Clifford 
algebra is a subspace of A(V*). However, it can be shown that the set of 
Clifford products exhaust the entire A(V*); i.e., that the Clifford algebra is 
2 -dimensional. This follows from the fact that a p-vector A, which can be 
written as 


i eta. 2 
A= — A Cin Av \&,, (27.7) 


can also be written as 


li gee. 3 
A->a= ae, V ei, Vie VG, (27.8) 


p! 
where we have introduced a new notation to differentiate between members 
of the exterior algebra and the Clifford algebra. The details of the derivation 
of (27.8) from (27.7) are given as Problem 27.1. 


27.1.1 The Dirac Equation 


The interest in the Clifford algebra in the physics community came about 
after Dirac discovered the relativistic wave equation for an electron. As is 
usually the case, when a mathematical topic finds its way into physics, a 
healthy collaboration between physicists and mathematicians sets in and the 
topic becomes an active area of research in both fields. Dirac’s discovery 
and its connection with the Clifford algebra has led to some fundamental 
results in many branches of mathematics. It is therefore worthwhile to see 
how Dirac discovered the equation that now bears his name. 

The transition from classical to quantum mechanics is made by chang- 
ing the energy E and momentum p to derivative operators:! E > id/dt 
and p — —iV which act on the wave function y. Thus a non-relativistic 
free particle, whose energy and momentum are related by E = p*/2m, is 
described by the Schrédinger equation 

“W7)2 
Qe ae ey 
ot 2m ot 2m 

The relativistic energy-momentum relation, E* — p? = m7, leads to 
Klein-Gordon equation whose time derivative is of second order. Although 
eventually accepted as a legitimate equation for relativistic particles, Klein- 


'We are using the natural units for which the Planck constant (over 27) and the speed of 
light are set to 1: h=1=c. 
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Gordon equation was initially abandoned because, due to its second deriva- 
tive in time, it gave rise to negative probabilities. Therefore, it was desirable 
to find a relativistic equation which was first order in time derivative, and 
Dirac found precisely such an equation. 

Dirac’s idea was to factor out E* — pp into (E — p)(E + p) and to some- 
how incorporate the mass term in the factorization. We avoid writing E and 
Pp as derivatives, but consider them as commuting operators. Since it is not 
possible to include m in a straightforward factorization, Dirac came up with 
the ingenious idea of multiplying E and p operators by quantities to be de- 
termined by certain consistency conditions. More precisely, he considered 
an equation of the form 


3 
(1+ om +m)¥=0 
i=1 


and demanded that 6 and a; be chosen in such way that 


3 3 
(16+ ¥o4;0)-m)(p6+ Dmint+m)y =o (27.9) 


j=l r=! 
reduce to 
3 
G ~S pe - mo =0. (27.10) 
i=1 


Multiplying the two parentheses above, we obtain 


3 3 
BPE? +) 0 Bar Epi + BmE +) ajBEpj + Y) ajar pjpi 
i=1 j=! ij=l 


3 3 
=F S > mej pj — BmE — S > mai pi —m? 
j=l i=1 


3 3 
1 
= PE” +) "(Boy + 0:8) Epi + ; >> (ja + e704j) pi pj — m?. 
i=l i,j=l 


For this to be equal to the expression in parentheses of Eq. (27.10), we need 
to have 


1 
pal, Ba; + a8 =0, 5 ajo + oa j) = —8;;. 


The last condition is the result of the fact that p; p; is symmetric in ij, and 
therefore, its product with the antisymmetric part of a; ; automatically van- 
ishes. Letting B = y° and a; = y', the above conditions can be condensed 
into the single condition 


yey’ +yhyh =n, (27.11) 
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This equation is identical to (27.4), hence the connection between the Dirac 
equation and Clifford algebra. 

It is clear that Eq. (27.11) cannot hold if the ys are ordinary numbers. 
In fact, Dirac showed that they have to be 4 x 4 matrices, now called Dirac 
y matrices. If the ys are 4 x 4 matrices, then y must be a column vector 
with 4 components. It turns out that two of these components correspond 
to the two components of the electron spin. It took a while before the other 
two components were identified as those of the antiparticle of the electron, 
namely positron. 


27.2 General Properties of the Clifford Algebra 


Equation (27.8) implies that a vector in A(V*), being a direct sum of p- 
vectors for different p’s, can be expressed as a linear combination of the 
basis vectors of A?(V*), where the basis vectors are given as Clifford prod- 
uct (rather than the wedge product) of the basis vectors of V: 


Theorem 27.2.1 Let {e}_, be a basis of an inner product space V. Then 
the 2% vectors 


1,e;,e; Vej(i < j),e; Ve; Vex, <j <k),...,e1 VeaV--- Ven 


form a basis of Cy. 


Thus, if u is an arbitrary vector of A(V*), then it can be expressed as 
follows: 


u=al+u‘e tule, ve, +--- tulle ve- very (27.12) 


where |i,i2 ...i,| means that the sum over repeated indices is over i, <i2 < 
-++ <i). Equation (27.12) is sometimes written as 


bec, Mectese 1, 
sal tule; + Su! Pei, Vein te + Syulen Vin Vey (27.13) 


where the coefficients are assumed completely antisymmetric in all their 
indices, but the sum has no ordering. 

Note the appearance of 1 in the sum multiplying the scalar a. This sug- 
gests changing (27.4) to 


vvu+uV v= 2g(u, v)1, (27.14) 
which, when specialized to the basis vectors, becomes 
ee; teje;=2gij1, e? =e Ve; = gil, (27.15) 


where we have removed the multiplication sign V, a practice which we shall 
often adhere to from now on. Equation (27.15) completely frees the Clifford 
algebra from the exterior algebra, with which we started our discussion. This 
is easily seen in an orthonormal basis: 


ep=+1, eej;=-eje, ifi 4). (27.16) 
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Multiplying elements u and v, each expressed as in (27.12) or (27.13), re- 
duces to the multiplication of various Clifford products of the basis vectors. 
In such multiplications, one commutes the basis vectors which appear twice 
in the product using (27.16) until all repetitions disappear and one regains 
Clifford products of basis vectors appearing in (27.13). The following ex- 
ample should clarify this. 


Example 27.2.2 Let V be a 2-dimensional Euclidean vector space (i.e., 
8ij = 6;;) with the orthonormal basis {e;, e2}. Consider two elements u and 
v of the Clifford algebra over V. These can very generally be written as 


u=a,1+ Bye: + Beer + yyeier 


V=a1+ Bier + pre + yye1e2 
and 


uV v= (a,1+ B,e1 + B,e2 + yuee2) V (a1 + Byei + Boer + yeie2) 
= O,d,1+ a, Bie + a, Bre2 + Ay Yyee2 
+ ay Bley + BiB, e1e1 +B, Bre1e2 + Br yy e1e1 €2 
wage aoe 
=] =1 


—e; 


2 2al 2 22 2 
+ Byer + By B, e2e1 +B; By e2€2 + By Vu e2e1 e2 
epee) SH ee 


—ee2 =1 —ee2 
-e =-1 
(—— i) oo_ 
+ Vuyere2 + Yu By e1 €2€1 +7uBye1 e2€2 +YuYv €1 €2€1 e2. 
——— a ——— 
—e)e2 =! —ee2 


We see that the right-hand side is a linear combination of 1, e;, e2, and e;e2, 
as it should be since the Clifford algebra is closed under multiplication. 
Problem 27.2 asks you to find the coefficients of the linear combination. 


Although we will primarily be dealing with real vector spaces, Eqs. (27.15) complex Clifford 
and (27.16) could be applied to complex vector spaces. Therefore, it is pos- algebras 
sible to have complex Clifford algebras, and we shall occasionally deal with 
such algebras as well. 
If u in (27.13) or (27.12) contains products of only even numbers of basis 
vectors, then it is called an even element of the algebra. The collection of even and odd elements 
all even elements of Cy is a subalgebra of Cy and is denoted by Ce. The 
odd elements are denoted by C!,, and although they form a subspace of Cy, 
obviously, they do not form a subalgebra. As a vector space, Cy is the direct 
sum of the even and odd subspaces: 


Cy=Cl ect. (27.17) 


The discussion above can be made slightly more formal. 
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Definition 27.2.3 Let w be the linear automorphism of V given by w(a) = 
—a for all a € V. The involution of the Clifford algebra Cy induced by a is 
called the degree involution and is denoted by wy. 


Note that for any u € Cy given by (27.13), wy (u) is obtained by chang- 
ing the sign of all the vectors in that equation. It is obvious that Wy, =k 
where :(u) = u for all u € Cy. Thus, wy is indeed an involution of Cy. 
Now, an involution has only two eigenvalues, +1, and the intersection of 
the eigenspaces of these eigenvalues is zero. Moreover, we can identify these 


eigenspaces as Cy and er. where 
C°, =ker(wy —0), Cl, =ker(wy +0). (27.18) 


Consider two inner product spaces V and U, and define the inner product 
on their direct sum V @ U by 


(v1 Buy, V2 @ ug) = (Vj, V2) + (Uy, Ug). 


Then we have the following important theorem: (For a proof, see [Greu 78, 
p. 234]) 


Theorem 27.2.4 Let W=V@ U. Then the Clifford algebra Cw is isomor- 
phic to the skew symmetric tensor product of Cy and Cy: 


Cw =Cvau = Cy av. 


The skew symmetric tensor product was defined for exterior algebras 
in Definition 26.3.7, but it can also be defined for Clifford algebras. One 
merely has to change A to V. Note that the caret over ® signifies the product 
defined on the space Cy © Cy. Thus, as vector spaces, Cyqy = Cy @ Cy. 
Since all Clifford algebras are direct sums of their even and odd subspaces, 
we have 


Cy © Cy = (Cy & Cy) ® (Cy @ Cy) 
= (CY, @ C2.) & (CP, ® Cj,) & (Cy ® CZ) @ (Cy, @ Cy). 
In particular, 
Cy = (Cy @ Cy) & (Cy @ Cy) 
Cy = (Cy @ Cy) ® (Cy ® CY). 


Furthermore, if we invoke the product of Definition 26.3.7 on the first equa- 
tion above, we get 


(27.19) 


ch, = (c}, @C®,) @ (CL Cz). (27.20) 


The second equation in (27.19), when restricted to vector spaces themselves, 
yields 


WE (ly @W @(V@ 1), (27.21) 
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where 1y and 1y are the identities of Cy and Cy, respectively. 
Consider the linear map oy : Cy > Ce given by 


ov(avV b)=oy(b)Voy(a), ov(v)=v, a,beCy, vEeV. (27.22) 


It is straightforward to show that oy is an involution and that it commutes 
with the degree involution: 


Ov OWV = Wy OOy (27.23) 


Definition 27.2.5 The conjugation involution is defined as oy o wy. The 
conjugate of an element a € Cy is 


a=oy coy(a). 
In particular, v= —vif ve V. 


It is clear from the definition that 


aVb=bva, a,beCy. 


We saw a special case of this relation in our discussion of the quaternions in 
Example 3.1.16. 


27.2.1 Homomorphism with Other Algebras 


Let V be an inner product space and A an algebra with identity. A linear map 
y:V— Acan always be extended to a unital homomorphism ¢: Cy > A. 
Indeed, since Cy consists of sums of Clifford products of vectors in V, one 
simply has to define the action of ¢ on a product of vectors in V. The obvious 
choice is 


O(V1 V V2 V--*V Vm) = G(V1) G(V2)... P(Vm) 


where on the right-hand side the product in A is denoted by juxtaposition. 
For ¢ to be extendable to a unital homomorphism, it has to be compatible 
with Eq. (27.14); i.e., it has to satisfy 


p(v Vu) + P(Uv v) = 2g(u, v)p(1) 
or, denoting g(u, v) by (u, v), 
p(v)y(u) + p(U)e(V) = 2(u, v) Ta. (27.24) 
By setting u = Vv, we obtain an equivalent condition 


g(v) = (v,v) 14. (27.25) 
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Example 27.2.6 Let yg : R? > £(R?) be a linear map. We want to extend 
it to a homomorphism ¢ : Ca: > £(R?). It is convenient to identify £(R7) 
with the set of 2 x 2 matrices and write 


y(v) = & -) (27.26) 


21 22 


Let v = (a, B). For the extension to be possible, according to Eq. (27.25), 
we must have 


O11, O12) (1, a2 2, af(l 9 
=(a~ + : 
(ee ) ( ) ( °) (( i) 


One convenient solution to this equation is a1, = @ = —Q22 and aj2 = 
a2) = B. Hence, we write Eq. (27.26) as 
_(* B 
g(a, B) = & [) (27.27) 


Now let {e1, 2} be the standard basis of R?. Then {1,e,,€2,e; Veg}isa 
basis of C2. Furthermore, (1) is the 2 x 2 unit matrix, and by (27.27) 


eey=0.0=(5 9), eer=00.0=(9 5). 


(1 V €2) = v(e1)g(e2) = (( *) ({ 5) = ei i) : 


It is easy to show (see Problem 27.7) that these 4 matrices form a basis of 
£(R*). Since @ maps a basis onto a basis, it is a linear isomorphism and 
therefore and algebra isomorphism. Thus, Cp2 = L£(R’). 


27.2.2 The Canonical Element 


Strictly speaking, the identification of (27.8) with (27.7) should be consid- 
ered as an isomorphism @y of A(V*) and Cy: 


bv (ei, A Cin Ave A e;,) =e, Vi, V--- V ej, (27.28) 
Invoking Proposition 2.6.7, we conclude that, 


Definition 27.2.7 Given a determinant function A in V, there is a unique 
element e, € Cy such that 


bv (€i, A++ A iy) = ACC, «+s Civ) CA- (27.29) 


ea is called the canonical element in Cy with respect to the determinant 
function A. 


27.2 General Properties of the Clifford Algebra 


Now choose an orthogonal basis {e; ar in V for which A(e;,,..., €iy) = 
1. Then, (27.28) and (27.29) yield 


ea=e,V--- Ven. (27.30) 
Next use the Lagrange identity (26.32) and write it in the form 
det((x;,y;)) =AsA(K1,....xVA(YL.--..¥N) XY;EV. (27.31) 


Setting x; = y; = e;, and evaluating the determinant on the left-hand side of 
the equation, we obtain 


Aa = (€1,€1)... (en, ev) = 811---8NN 


= (e; Vey)... (ev V ev) =e]. -ew, (27.32) 


where we used Eq. (27.15). Using (27.32), together with (27.15) and 
(27.30), one can easily show that 


ey =en Ven = (-1I) NOY) 4 «1. (27.33) 


Since A, 4 0, it follows that e, is invertible. 
Equation (27.30) can be used to show that 


e Vea = (—1)% ea Ve, 


and since any vector in V is a linear combination of the basis {e}_ 1 the 
equation holds for arbitrary vectors. We thus have the following: 


Theorem 27.2.8 The canonical element e, satisfies the relations 
en Vv=(-1)%"!vvea, ve, 
N-1 
eaVvu=oay (u)vea, ucCy, 


where wy is the degree involution of Definition 27.2.3. In particular, 
en VU=UV ey if N is odd, and ex, VU=ay(U) V epg if N is even. 


27.2.3 Center and Anticenter 


Definition 27.2.9 The center of the Clifford algebra Cy, denoted by Zy, 
consists of elements a € Cy satisfying 


avu=uva Vue€Cy. 


The anticenter of the Clifford algebra Cy, denoted by Zy, consists of ele- 
ments a € Cy satisfying 


aVu=oy(u)va VYuey. 
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Since Cy is generated by V (i.e., it is sums of products of elements in V), 
it follows that 


aeZy ifandonlyifavx=xvaVvxe’V, 


and 
acZy  ifandonly ifavx=—xVaVxe V. 


It is easy to show that Zy is a subalgebra of Cy and that both Zy and Zy 
are invariant under the degree involution wy. Therefore, as in Eq. (27.17) 


Ly = 2, Ov By 

(27.34) 
= =) zl 
Ly = Zy By Zy 


where @y indicates a direct sum of vector spaces. 


el 
Proposition 27.2.10 2 =0. That is, the anticenter of any Clifford algebra 
consists of even elements only. 


Proof Ifae ram then a V x = —x Va for any x € V. Equation (27.30) then 
gives 


ave, =(—-1)Neq Va. 


On the other hand, Theorem 27.2.8 and wy (a) = —a forae Z = Ct NZy 
yields 


e, Va=oy |(a)Veq =(—-1) 'avey. 


The last two equations, therefore, give ave, = 0, and since ea is invertible, 
we have a= 0. 


Proposition 27.2.11 If V is odd-dimensional, then ea € Zy, and if it 
is even-dimensional, then en € Zy. 


Proof The proof follows immediately from Theorem 27.2.8. 


Consider the linear map dy : Cy > Cy given by 
év(u)=eavu, ucly 


and note that since e, is invertible, dy is a linear isomorphism. If N is 
odd, then Eq. (27.30) shows that @y : ee > cee and @¢y establishes an iso- 
morphism of ea and Cc. Now restrict the map to Zy. Then, using Theo- 
rem 27.2.8, for x € V, we obtain 


dy(u)Vx=ea VUVX=ea,aVXVU 


= (-1)%—!xve, Vu=(-1)"7! xv gy (wu). 


27.2 General Properties of the Clifford Algebra 841 
Similarly, 

évy(v) Vx=(-1)*¥xv dy(v) ve Zy, xeV. 
We have just proved 


Proposition 27.2.12 If N is odd, then dy restricts to linear automorphisms 
of Zy and Zy and establishes an isomorphism between ye and ren If N is 
even, then gy interchanges Zy and Zy. 


Proposition 27.2.13 2°, = Span{1}. 


Proof We use induction on the dimension of V. For N = 1, the proposition 
is trivial. Assume that it holds for N — 1, and choose v € V such that (v, v) 4 
0. With U the orthogonal complement of v, we can write 


V = Span{v} @ U 
and note that (27.20) and (27.21) become 
CY, = (18 C2) 6 (vez) 
V=(1@U) G(v@1). 
Identifying the left and right hand sides of these equations, we write 


u=1@b+vec, uel, beC®, ceci, 


x=1®@y+v®@l, xeV, yeu. 


We now use the multiplication rule of Eq. (26.24), noting that 1 and b have 
even degrees while v, y, and c have odd degrees: 


UVx=(1®b+vO0OO0(1@y+v@1) 
=(1@b)0(1@y)+(1@b)O(V@1) 
+(Vv@c)O(1@y)+ (Vv @ oe) O(V@1) 
=1@(bVy)+v@b+v@(cvy)—(VVv)@ec. 


Similarly, 
XVU=1@(yVb)+v@b-v®(yvot+(vVv)@e. 


Now assume that u is in the center of the Clifford algebra. Then the two 
equations above are equal, and noting that v V v = (v, v)1, we obtain 


1@(bVy—yvb—-2(v, vc +v®(evy+yvo=0, 
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or 


bvy—yvb—-2(v, v)c=0 
(27.35) 
cVvyt+yvc=0. 


The second equation implies that ¢ € Zy and hence ¢ € cies Then by Propo- 
sition 27.2.10, ¢ = 0. The first relation in (27.35) now implies that b € Zy 
and therefore b € ae By induction assumption, b is a multiple of the iden- 
tity in Cy. Thus, 


u=1®(aly)+v@0=al, 


i.e., Ue Span{T}. 


Theorem 27.2.14 Let V be an N-dimensional space. Then 


(a) If N is odd, Zy =Span{1, ea}, Zy =0. 
(b) If N is even, Zy = Span{1}, Zy = Span{e,}. Thus all Clifford 
algebras over an even-dimensional vector space are central. 


Proof Suppose that N is odd and a € Zy. Then, a V x = —x V a, for any 
x € V. Equation (27.30) then yields a Ve, = —e, V a. On the other hand, 
the second equation of Theorem 27.2.8 implies that av ea, = ea Va. Hence, 
ave, = 0, and since eg is invertible, we have a = 0. This proves the second 
part of (a). 

Next observe that by Proposition 27.2.13 pa = Span{1} and that, since 
N is assumed odd, by Proposition 27.2.12, @y is an automorphism of 
Zy and an isomorphism between ze and i Since pao = Span{1} and 
dv (1) = ea, we must have eae = Span{e,}. 

Now consider the case when WN is even. For a € Zy, we have aV eg = 
ea, V a. On the other hand, by Theorem 27.2.8, aV en = wy(a) V eg. 
Since e, is invertible, we have wy (a) = a or (wy — ja = 0. Therefore, 
by Eq. (27.18), a€ yee and hence Zy = cee Proposition 27.2.13 now gives 
Zy = Span{t}. 

Since @y interchanges Zy and Zy, we have 


Zv = ov (Zy) = by (Span{1}) = Span{ Py (1)} = Span{ea}. 


This completes the proof. 


27.2.4 Ilsomorphisms 


Let (V,g) be an inner product space. If we change the sign of the inner 
product, we get another inner product space V = (V, —g). Next consider 
two vector spaces V and U, and suppose that A can be chosen in such a 
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way that the canonical element of Cy satisfies ee = +1. Then we have the 
following two theorems whose proof can be found in [Greu 78, pp. 244— 
245]: 


Theorem 27.2.15 Let dim V = 2m, and assume that A can be chosen such 
that \4 = (—1)". Then the Clifford algebras Cy and Cy are isomorphic. 


Theorem 27.2.16 Let V be an even-dimensional inner product space 
and U any other inner product space. Then 


Cyveau =Cy ® Cy ife, =1 


Cyei Zev @Cy ife,=-1. 


Another theorem, which will be useful in the classification of Clifford 
algebras and whose proof is given in [Greu 78, p. 248], is the following. 


Theorem 27.2.17 Let V be an even-dimensional inner product space. As- 
sume that V has an antisymmetric involution w (so that w! = —w). Then the 
Clifford algebra Cy is isomorphic to £(A(V1)), the set of linear transfor- 
mation of A(V,) where V, = ker(@ — 1). 


Recall from Theorem 26.3.6 that dim A(V1) = 24im V1 | and that all real 
vector spaces of dimension N are isomorphic to IR. Therefore, we can 
identify £(A(V)) with £(R2""”'), and obtain the algebra isomorphism 


Cy = £(R7""") (27.36) 


for V of the theorem above. 


27.3. General Classification of Clifford Algebras 


In almost all our preceding discussion, we have assumed that our scalars 
come from R. There is a good reason for that: the complex Clifford algebras 
are very limited and applications in physics almost exclusively focus on real 
Clifford algebras. In this subsection, we include complex numbers as our 
scalars and classify all complex Clifford algebras. 


Definition 27.3.1 Let F denote either C or R and let V be a vector space 
over F. Choose a basis {e; a for V and let v= Ri nie; be a vector in V. 
A quadratic form of index v on V is a map Q, : V > F given by 


QW)=-donp+ D> 0}. (27.37) 


i=l i=v+l 


quadratic form 
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A quadratic form yields an inner product.” In fact, defining 
2g(u, v) = Q,(u+ v,u+ v) — Q,(u, u) — Q,(¥, ¥), 


it is easy to verify that g is indeed a symmetric bilinear map. Conversely, 
given an inner product g of index v, one can construct a quadratic form: 
Q,(v) = g(v, v). It turns out that the basis vectors chosen to define the 
quadratic from are automatically g-orthonormal. 

When F = C, there is only one kind of quadratic form: that with v = 0. 
This is because one can alway change n,; to ing to turn all the negative 
terms in the sum to positive terms. However, when F = R, we obtain dif- 
ferent quadratic forms depending on the index v. Thus, the real quadratic 
form leads to the inner product of IR” introduced in Eq. (26.40) and the cor- 
responding Clifford algebra will be discussed in detail in the next section. 

Before we classify the easy case of complex Clifford algebras, let us ob- 
serve some general properties of the Clifford algebra over F, which we de- 
note by Cy (F). First, we note that since e? =e; Ve; = g(e;, e;)1, ef # 0 for 
any positive integer k. Therefore, Cy (IF) cannot contain a radical for any V 
over F. This means that Cy (F) is semi-simple. Moreover, Theorem 27.2.14 
implies that Cy (F) is simple if V is even-dimensional. 

Next, we look at the case of odd-dimensional vector spaces which is 
only slightly more complicated. In this case Z2y = Span{1,e,} by Theo- 
rem 27.2.14, and as we shall see presently, ee plays a significant role in the 
classification of Cy (F) when dim V is odd. Equation (27.33) gives e* in 
terms of A, of Eq. (27.32). If F = C, then we can choose g(e;, e;) = 1. In 
fact, for any non-null vector v € V, we have 


Vv Vv 
(ey, ey) = ( : ) =1 
: : Jatv,v) avy, v) 


Hence, we can always set 4, = 1 when F = C. We can’t do this for the real 
case because ,/g(v, v) may be pure imaginary. 

If F=R, then because of the Lagrange identities (26.42) and (27.31), 
Aa = (—1)” and the canonical element satisfies the relation 


&) = (-1I)NO-D24, (27.38) 


Thus ee = +1 depending on the index v and dimension N of V. 

We discuss the case of ee = +1 first. The elements P+ = 5(1 + en) 
are two orthogonal idempotents belonging to the center of Cy(F). Since 
P. + P_ = 1, we have the decomposition 


Cy (F) = Ch) 6 Cy F) =P Cy (F) © P_Cy®), (27.39) 


where Cy (IF) and Cy (F) are subalgebras (actually ideals) of Cy (F). 


*Here we define an inner product simply as a symmetric bilinear map as in Defini- 
tion 2.4.2. 
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Since e, is the product of an odd number of vectors, wy (ea) = —ea. 
Hence, wy (P+) = P=. Furthermore, wy, being an involution, has an inverse. 
Thus, it is an isomorphism of Cr (F) and Cy (F). 

We now show that Cy, (F) are central. We do this for Cae (F), with the other 
case following immediately from the proof of Gas (F). Let ay € rae (F), the 
center of oe (F), and b € Cy(F). Then 


a,b=a,(b; +b_)=a,b, +a,b_ =a,b, =b a; =ba 


because 0 = a, b_ = b_a,. It follows that a, € Zy (IF). Therefore, 


a, =a1+ Be, =a(P; +P_)+ B(P; —P_)=(a@+ B)P; + (a — B)P_. 


Since a; has no component in P_, we must have a = # and a, = 2aP,. 
Hence, a+ € Span{P+}. But P+ is the identity of ca (F). It now follows that 
Cy, (F) is central simple. Similarly, Cy, (F) is central simple. We summarize 
the foregoing discussion as follows: 


Theorem 27.3.2 Let F be either C or R and V a vector space 
over FF. 


(a) IfdimV is even, then Cy (F) is central simple. 

(b) Jf dimV is odd, then Cy (C) is the direct sum of two isomorphic 
central simple Clifford algebras. Cy (R) is the direct sum of two 
isomorphic central simple Clifford algebras if ee =-+1. 


We are now ready to classify all complex Clifford algebras. All we have 
to do is to use Theorem 3.5.29: 


Theorem 27.3.3 A complex Clifford algebra Cy (C) is isomorphic to either 
a total complex matrix algebra or a direct sum of two such algebras: 


(a) Cy(C)=M,(C) for some positive integer r, if dimV is even. 
(b) Cy(C)=Ms(C) @Ms(C) for some positive integer s, if dim V is odd. 


Although the real Clifford algebras are classified in the next section in 
much more detail, it is instructive to give a classification of Cy (R) based 
on what we know from our study of algebras in general. If V is even- 
dimensional, Cy(R) is central simple, and by Theorem 3.5.30, it is of the 
form D @ M, where D is R or H. 

If V is odd-dimensional, then we have to consider two cases. If 1 =+1, 
then Cy (R) is the direct sum of two central algebras and thus isomorphic to 


R@M,=M,(R) orto HOM; =M;,) =H@M,(R), 


for some nonnegative integers r and s. If ee = —1, then the center of 
Cy(R), which is Span{1,e,}, is isomorphic to C, and again by Theo- 
rem 3.5.30, Cy(R) is isomorphic to 


C@eM,=M,(C) orto CQOH@M, =HEM,(C), 
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for some nonnegative integers p and q. 
We summarize this discussion in 


Theorem 27.3.4 A real Clifford algebra Cy (R) is classified as follows: 


(a) If V is even-dimensional, then Cy(R) =D ®M,; = M,(D), where r 
is a positive integer and D =R or H, i.e., Cy(R) is a total matrix 
algebra over reals or quaternions. 

(b) If V is odd-dimensional, then we have to consider two cases: 

1. Ife = —1, then Cy(R) = M;(D), where s is a positive integer 
and D is either C orC @ H. 

2; Ife = 1, then Cy (R) = M,(D) OM, (D), where p is a positive 
integer and D is either R or H. 


27.4 The Clifford Algebras C/ (R) 


Our discussion of inner products in Sect. 26.5 showed that orthonormal 
bases are especially convenient. In such bases, the metric matrix gj; = nj; 
is diagonal, with nj; = +1. In fact, if v is the index of V (see Theo- 
rem 26.5.21), then, introducing “ = N — v, we have 


0 ifiXs, 
Sij=nj=7ytl ifl<i<uyp, (27.40) 
a) fede 22, 


and Eq. (27.15) becomes 


ej Ve;=—ej; Ve; ifiFj, 
ej Vej = +1 ifl<i<yp, (27.41) 
ej Ve; =—1 ifu+1<i<N. 


The Clifford algebra determined by (27.41) is denoted by LO (R).? It is the 
Clifford algebra of the vector space R/ introduced on page 815. 


Example 27.4.1 The simplest C; (IR) is when one of the subscripts is 0 and 
the other 1. First let 4. = 0 and v = 1. In this case, V is one-dimensional. Let 
e be the basis vector of V. Then a basis of the Clifford algebra Ch(R) is 
{1,e}, and an arbitrary element of Ci(R) can be written as a1 + Be. The 
multiplication of any two such elements is completely determined by the 
multiplication of the basis vectors: 


1V1=1, 1ve=evil=e, eVve=-1. 


3Many other notations are also used to denote this algebra. Among them are Cy,» (R), 
C(u, v), Cly,v(R), C£p,q(R), and C(p, gq) where gq = v and p = w. Occasionally, we’ll 
use one of these notations as well. 
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If we identify e with i = ./—1 and V with ordinary multiplication, then 
Ch) becomes identical with the (algebra of) complex numbers. Thus, 
Co(R) =C 
o(R) = C. 
Now let w = 1 and v = 0. Again, V is one-dimensional with e as its 
basis vector. The basis of the Clifford algebra CPR) is again {1,e}. The 
multiplication of the basis vectors gives 


1V1=1, Tve=evl=e, eVe=1. 


This shows that 1 and e have identical properties, and since 1 is a basis of 
IR, so must be e. We conclude that Cc (R) =R@R. As a direct sum of two 
algebras, IR @ R has the product rule, 


(a1 ® a2)(B1 © B2) = (a1 81 @ a2 Bo). 


In analogy with the ordinary complex numbers, R @ R with this multipli- 
cation rule is called split complex numbers. Problem 27.6 establishes a 
concrete algebra isomorphism between Cc? andR@R. 


Example 27.4.2 In this example, we consider a slightly more complicated 
Clifford algebra than Example 27.4.1, namely CG (IR). Let e; and e2 be the 
two orthonormal basis vectors of the two-dimensional vector space V on 
which the Clifford algebra c (R) is defined. This algebra is 4-dimensional 
with a basis {1, e1, €2, €; V e2}. To make the multiplication of the basis vec- 
tors more transparent, let’s set e] = a, €7 = b, e; V ep = €. Then it is clear 
that 


aVa=-—1, aVb=c, aVvc=-—b 
bva=-—c, bvb=-1, bvc=a (27.42) 
cVa=b, cVb=~—a, cVc=-1. 


Most of these are self-evident. The less obvious ones can be easily shown 
using Eq. (27.41). For example, 


cVc=e, V eo Ve, Vez =— ej Vey Veo Ven = — 1. 
Co Cee 
=—e, Ven =-—1 =-1 


Comparison of Eq. (27.42) with Example 3.1.16 reveals that C3(R) is the 
algebra of quaternions: Cc (R) =H. 


The two examples above identified some low-dimensional Clifford al- 
gebras of the type Ci, (R) with certain familiar algebras. It is possible to 
identify all Ci, (R) with more familiar algebras as we proceed to show in the 
following. We first need to establish some isomorphisms among the algebras 
Cy, (R) themselves. 

Set N = 2 in Eq. (27.38) to get e4 = (—1)!"1. In particular, for v = 0 
and v = 2, we get e* = —1 for both CR) and Co(R). Now in The- 
orem 27.2.16, let U=R?_, and V = R34 (recall that V has to be even- 
dimensional) and note that U = IR’. The second identity of that theorem 
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then gives 
Crror: = CR ® Crr_, = Co(R) @ CHR). (27.43) 


Note the position of and v in the last term! Also note that since R3 @ 
R” = R” @R3, we must have Cj (R) ® C} (R) = Cy (R) ® CZ (R). But R5 ® 
Ri = R's by Proposition 26.5.23. Hence, the left-hand side of the equation 
above is simply Cr (R). By choosing U = R?, and going through the same 
procedure we obtain a similar result. The following theorem, in which we 
have restored jz and v to their normal position on the right-hand side of 
(27.43) (now written on the left in the following theorem), summarizes these 
results. 


Theorem 27.4.3 There exist the following Clifford algebra isomor- 
phisms: 

C}(R) ® Ch(R) = Cx? (R) 
(R) 


Cy,(R) @ CR) = Ch, 


Theorem 27.4.4 Suppose that 4 =v + 4k for some integer k. Then 
CLR) = CHR) 


Proof We note that N = uw +v=2v+ 4k = 2(v + 2k) = 2m. Therefore, if 
A is the normed determinant function, then 


Aa = (-1)" = (-1)" 4 = (-1)". 


Now apply Theorem 27.2.15, noting that Re =Ri. 


For the special case of v = 0, we obtain 
CU(R)=Ch(R) if N=4k. (27.44) 


The case of jz = v is important in the classification of the Clifford alge- 
bras. In this case, we can write 


R? =R;” =R‘ =Rh OR4, (27.45) 


where the inner product is positive definite in the first subspace and negative 
definite in the second. Let {8} 4 and ia be orthonormal bases of Ro 
and Ri , respectively, so that 


(@;,€;) = dij, (f;, f;) = —6i;, Lyf = 1, 2.2585 
Now define an involution w on R’ as follows: 


o(@)=f, and w(t) =6@, i=1,2,..., ph. (27.46) 


27.4 The Clifford Algebras C),(R) 


Then it can easily be shown that w’ = —@ (see Problem 27.12). Hence, 
by Theorem 27.2.17 and Eq. (27.36), Cii(R) = £(R2""!), where V, = 
ker(w — 1). But 


im 
u= YS (aie: + Bif}) Eker(w-1) = aj = 68; 
i=1 


as can be readily shown. This yields V, = Span {é; +f yt,. Therefore, 
dim V; = p, and (27.36) gives the isomorphism 


CH(R) = £(R”). (27.47) 


Example 27.4.5 For the simplest case of 4 = 1, we have ci (R) = 
£(R*) = M?**. The isomorphism can be established directly by a pro- 
cedure similar to Example 27.2.6. 

For the case of 4 = 2, we have C3 (R) X £(R*) = M***. Furthermore, 
setting 44 = 2 and v = 0 in the second equation of Theorem 27.4.3, we obtain 


CR) = CR) e CR) =H@H 


where we used the result of Example 27.4.2. Hence, we have the isomor- 
phisms 


C)(R) = £(R*) = M4 =H@H. (27.48) 


Problem 27.14 gives a direct isomorphic map from H @ H to £(R*). 


27.4.1 Classification of C?(R) and CZ (R) 


From the structure of Cc (IR) and Co (R) for low values of n, we can con- 
struct all of these algebras by using Theorem 27.4.3. First, let us collect the 
results of Examples 27.2.6, 27.4.1, and 27.4.2: 


QR=c, CRWM=ReR, CM=L(R), ChQ=H. 
(27.49) 
Next let 4. = 1 and v = 0 in the first equation of Theorem 27.4.3 to obtain 


C3(R) = CR) @ CA(R) = (R@R) @H= (ROH) SG ROM =HOH, 


where we used Eqs. (2.16) and (2.18). Similarly, with w = 0 and v = 1, the 
second equation of Theorem 27.4.3 yields 


C{R) = CAR) @ CIR) = C@ L(R). 
Setting 44 = 2, v = 0 in the first equation of Theorem 27.4.3, we obtain 


Co(R) = CIR) = CIR) @ CHR) = £(R’) @H=H@ L(R’) 


849 


27 Clifford Algebras 


Table 27.1 Classification of Ce (R) and Cj (R) forn <8 


n CR) ChR) 

1 ROR Cc 

2 £(R?) H 

3 C@ L£(R?) HeH 

4 H® £(R?) H® £(R’) 

5 (H® L£(R?)) 6 H®@ L£(R’)) C@LR’)@H 
6 H@ LR’) LR’) 

7 C@H®@ LR‘) L£(R®) & £(R$) 
8 £(R!®) LAR!) 


where the first isomorphism comes from Eq. (27.44) and the last from 
Eq. (2.17). We can continue the construction of the rest of Cc (R) and Cj (R) 
for n < 8. The results are tabulated in Table 27.1. The reader is urged to ver- 
ify that the entries of the table are consistent with Theorem 27.3.4, keeping 
in mind that £(R”) can be identified as the total matrix algebra R ® M,. 

Let V= R’ and U = Rj, noting that | =1 for Cy = Ce). Now use 
Theorem 27.2.16 to obtain 


Crop _ Ces ® Crpn or Cpnts = Ces ic) Cpe 
or 
Co, (R) = CR) @ CRM) = COR) @ £(R"*). (27.50) 
Similarly, using V = R’ and U = R}) in Theorem 27.2.16 yields 
Co R) = CHR) @ £(R"). (27.51) 


It is clear that these two equations plus Table 27.1 generate Cj(R) and 
C°(R) for all n.4 

Matrices are much more convenient and intuitive to use than linear trans- 
formations. It is therefore instructive to rewrite Table 27.1 in terms of ma- 
trices keeping in mind that with F being C or H, 


L(R")=M(R) and F@L(R")=F@M,(R)ZMa(P), (27.52) 


where JV, (IF) denotes an n x n matrix with entries in F. The results are 
given in Table 27.2. 

We also write the periodicity relations (27.50) and (27.51) in terms of 
matrices: 


C?, .(R) = C2(R) ® Mi6(R) 
(27.53) 


Cpt8(R) = CHR) @ Mis(R). 


“It is worth noting that, by using V = Rj or V= Ri, we could obtain formulas for 


(ou 4(R) and ar (IR) analogous to (27.50) and (27.51). However, as entries 5, 6, and 7 
of Table 27.1 can testify, they would not be as appealing as the formulas obtained above. 
This is primarily because C4 (R) = CQ(R) = £(R!®). 
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Table 27.2 Classification of ou (R) and Cj (R) for n < 8 in terms of matrices 


n C?(R) Ch(R) 

1 ROR Cc 

9 M>(R) H 

3 C @®M2(R) = M2(C) H@H 

4 HH @ M2(R) = M2(H) H® M2(R) = M2 (H) 

5 Mz2(H) ® M2 (H) H@ M2(C) = C @ M2(H) 
6 H ® M4(R) = M4(H) Me(R) 

7 H®M,4(C) =C® M,4(H) Ms(R) 6 Mg (IR) 

8 Mi6(R) Mi6(R) 


27.4.2 Classification of C7, (R) 


We can now complete the task of classifying all of the algebras Cc, (R). The 
case of 4 = v is given by Eq. (27.47). For the case of 2 > v, let w= v+o 
and note that 


R” = Rh @R’ = (Rj @R}) @R, = Rp @ (RU OR)) =Ro OR®”. 


Now let V = RY and note that Cy = C}\(R). Furthermore, Eq. (27.38), with 
N = 2v gives 4 = 1. Hence, with U = Rg, the first equation of Theo- 
rem 27.2.16 yields 


C)(R) = ChR) @ CLR) =L(R”) @CRR), (27.54) 


where we used (27.47). 
If uw < v, let v= w+ p and note that R? = RS @Rs. With V = Rt and 
U= RS, the first equation of Theorem 27.2.16 yields 


Cy (R) = CH(R) @ Ch(R) = L(R”) @ CHR). (27.55) 


It is worthwhile to collect these results in a theorem. Noting that Ch =R 
and that A ®@ R = A for any real algebra, we can combine the three cases of 
[L=v, > v, and pu < v into two cases: 


Theorem 27.4.6 The following isomorphisms hold: 


CR) = £(R") @C)_, BW =MyR @Ci_,R, fuzv, 


C)(R) = £(R™) @ Co (R)= Mx (R)@CY“R), ifusy. 


This theorem together with Table (27.2) and the periodicity relations 


Ya 


C’_.g(R) = CV R) @ Mig(R 
(27.56) 


Cy (R) = CR) @ Mig(R 
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which come from Eq. (27.53), determine all the algebras Ci, (R). 
From Theorem 27.4.6, the periodicity relation (27.53), and Table 27.2, 
we get the following: 


Theorem 27.4.7 All Clifford algebras C/,(R) with w—v #1 mod 4 are 
simple. Those with 4 —v=1 mod 4 are direct sums of two identical simple 
algebras. 


27.4.3 The Algebra C3(R) 
For the Minkowski n-space, R}, Theorem 27.4.6 gives 
Ch_1(R) = Mp(R) @C}_,(R). 
When n = 4, this reduces to 
CLR) = MR) @ CLR) = M(B) @ M(B) = MB). 


In the language of Chap. 3, Cc} (R) is a total matrix algebra, which, by either 
Theorem 3.5.27 (and the remarks after it) or Theorem 3.3.2, is simple. We 
now find a basis {e;;} of this algebra. 

First we find the diagonals {e; Vis which are obviously primitive, or- 
thogonal to each other, and 


1 =e1; + 22 + €33 + e44. 
Thus, by Theorem 3.5.32, the identity has rank 4. Next, we construct four 


primitive orthogonal idempotents {Pi}}_ , Out of the basis vectors? {En}3_o 


of c (R) and their products and identify them with {er }}_|- The easiest 
way to construct these idempotents is to find x and y such that 


x =1=y’, xy = yx. 


Then the product of 5(1 - x) and 5(1 + y) for all sign choices yields four 
primitive orthogonal idempotents, as the reader may verify. There are many 
choices for x and y. We choose x = €; and y = €92, where we use the com- 
mon abbreviation en) ..-np =@,, V---V en, and set 


1 i . 
P; = ri + e;)(1+ e92) = e11, 


1 7 : 
P= 4" + €1)(1 — €o2) = €22, 
(27.57) 


1 7 r 
P3 = 4" — €)(1 + €o2) = e633, 


1 F ° 
r= €1)(1 — €o2) = egg. 


>Here we are using the physicists’ convention of numbering the basis vectors from 0 to 3 
with @9 = €4 and using Greek letters for indices. 
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Since the P;s are all primitive (thus, of rank 1), by Theorem 3.5.32, they are 
similar. Indeed, one can easily show that 


s | A 

€03P1€9; = €03P1€03 = P2, 

éP\é; | = @3P)é3 = P3, (27.58) 
€oP 165 | = —€oP 1 é = Pa. 


Equations (27.57) and (27.58) determine all e;;s, as we now demonstrate. 
Write the first relation of (27.58) as €93P; = P2€o3 or €93€11 = €22€03. Since 
€o3 € Cc} (R), it can be written as a linear combination of {e;;}. With €o3 = 

4 
Duig=i aj jei;, we have 


4 4 4 4 
> A je;je11 = €22 > aije;; OF Yo aire = Y > a2je2;. 
i=l j=! 


i,j=l i,j=l 


Linear independence of {e;;} implies that a;; = 0 for i 4 2 and a2; = 0 for 
j #1. Therefore, the left-hand side (or the right-hand side) of the equation 
reduces to @21€21. Hence, we have 


€03P1 = a21€21. (27.59) 


We can also write the first relation of (27.58) as P1€93 = €03P2 or e11;€03 = 
€03€22, which yields 


4 4 4 4 
e11 >. A jeij = > aije;je22 OF So aijerj = Y> airei2. 
j=l i=1 


i,j=l i;g=1 


Again, linear independence of {e;;} implies that a; = 0 for j #2 and aj2 = 
0 for i 4 1. Therefore, the left-hand side (or the right-hand side) of the 
equation reduces to a@12€12, and we get 


P) 03 = @12€12. (27.60) 
Multiply Eqs. (27.59) and (27.60) to get 
(€03P1)(P1€03) = (@21e21)(a12€12) 
or 


€03P 1 €03 = 121 0112€21€12 = 2101222 = 21012P 2. 


Comparing this with the first equation in (27.58), we conclude that 
21012 = 1, which is also a consistency condition for €93 V €93 = 1. There 
are several choices for a;;, all of which satisfy this as well as other condi- 
tions obtained above. Therefore, we are at liberty to set a2} = 1 = a12 and 
write 


€03P) = e21, P1€03 = €03P2 = e12. (27.61) 
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Table 27.3 The basis e;; for the total matrix algebra c} CR) 


ei j=] j=2 i= jaa 
i=l P; @o3P2 @sP3 —€ oP. 
i=2 @o3P P, @oP3 —€P, 
i=3 @sP| —€P2 P; @o3Pa 
i=4 @oP 1 —€;P @o3P3 Py 


Going through the same procedure using the second and third relations of 
(27.58), we obtain 
é3P; = e31, P1é3 = &3P3 = e13, 
. ae (27.62) 
€oP1 = e1, P)€9 = €oP4 = —e14. 


Having found e;; and e;;, we can find all the e;; because e;; = e;1e1;. The 
result is summarized in Table 27.3. 

Now that we have the basis we were after, we can express the basis vec- 
tors {8n}}_o of the underlying vector space in terms of the new basis. Writ- 
ing 


4 
en = > Yn.ij ©ij » 


ij=l 


multiplying it on the right by eg; and on the left by e,,,, we obtain 
€mn€y€kl = Yn.nk€ml; 

which, for m = 1 =1 yields 
C1n€n€k1 = Yn,nkell- (27.63) 


Thus, to find yn, multiply é, on the left by e;, and on the right by ex 
and read the coefficient of e;; in the expression obtained. As an example, 
we find 73,12. We have 


€11€3e21 = 73,1211. 
The left-hand side can be evaluated from the table: 


€11€3e21 = €11€3€93P) = e11€3 V €p V €3P| 
= —e) 163 V 63 V €oP 


= —e11€9P) = —e1;e4; = 0. 
Thus, y3,12 = 0. Similarly, we find y3,13: 
e11€3€31 = 73,13e11- 
Using Table 27.3, we get 


€11€3€3] = €11€3€3P) = e11€3 V €3P) = e1; 1e1; = e114. 
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Thus, ¥3,13 = 1. 

We can continue this way and obtain all coefficients y, ;;. However, an 
easier way is to solve for é, from Eq. (27.57). Thus 


é; =P, + Po — P3 — Py = e1; + €22 — €33 — eug, 


giving the matrix 


1 0 O 0 
{oO 1 0 0 
NGO) 2k. 0 
00 0 -!1 
Similarly, 
€o2 = P; + P3 — Po — Pu, (27.64) 


from which we can get €2 by multiplying on the left by ép and noting that 
€n@o2 = En€pé2 = —€r. 


Thus, 


€2 = —€9P) — €pP3 + EoP2 + €oP4 = —e41 — €23 —€32 —€14 (27.65) 


where use was made of Table 27.3 in the last step. It follows that 


6-0 OG <4 
_[0 0 -1 0 
P=lo -1 0 0 
-1 0 0 0 


The remaining two matrices can be obtained similarly. The details are left 
as Problem 27.22. The result is 


0 0 1 0 Co oi 
—|0 0 O14 _fo 0 1 0 
B= 9 0 6 CI* PR lo =i o 6 
0-1 0 0 100.0 


Since @,, V @, + @, V @,, = 2nyy by (27.15) and (27.40), we have 
Varo + YWvYp = 2p, (27.66) 


which can also be verified directly by matrix multiplication. Equation (27.66) 

is identical to (27.11) obeyed by the Dirac gamma matrices. The matri- 

ces that Dirac used in his equation had complex entries. The matrices con- 

structed above are all real. They are called the Majorana representation of Majorana representation 
the Dirac matrices. 
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27.5 Problems 


27.1 Starting with Eq. (27.8), write e;,_ , Ve;, in terms of the wedge product 
using Eq. (27.3). Then use the more general Clifford product (27.2) repeat- 
edly until you have turned all the v’s to A’s. 


27.2 Find the coefficients of 1, e;, e2, and e; V e2 for the Clifford product 
u V v of Example 27.2.2. 


27.3. Show that Eqs. (27.24) and (27.25) are equivalent. 


27.4 Show that because of (27.24), g can be extended to an algebra homo- 
morphism only if it is an injective linear map. 


27.5 Show that the conjugation involution of Definition 27.2.5 coincides 
with the usual complex and quaternion conjugation. Show that av b = 
bva. 


27.6 Let g: R— R@R bea linear map. Assume a completely general form 
for g, i.e., assume g(a) = (6 ® y). Extend this linear map to a homomor- 
phism ¢: Ci — R@®R. Imposing the consistency condition (27.25), deduce 
that 6* = a* = y*. Now show that a non-trivial homomorphism sends 1 to 
1@ 1 ande to 1 6 —1, and therefore is an isomorphism. Finally for a general 
element of Cc. show that 


p(al+ pe)=(a+B,a—f), a, BER. 
27.7 Show that the four matrices 
(( ) (( °) ¢ "| iC ) 
0 1)’ 0 -1)’ 1 O/}’ -1 0 
are linearly independent. 
27.8 Derive Eq. (27.33). 
27.9 Show that the center Zy is a subalgebra of Cy. 


27.10 Show that both Zy and Zy are invariant under the degree involution 
wy. 


27.11 Let Q, : V > F be a quadratic form defined in terms of the basis 
{e;}_, and g the inner product derived from Q,. Show that g(e;,e;) = 
+6;;. 


27.12 Let uv € Re of Eq. (27.45). Write u and v in terms of the ba- 
Bey and show that the w of Eq. (27.46) satisfies 
(u, ov) = (—au, v), implying that o! = —w. With u= AC + Bifi), 
show that u € ker(w — 2) iff aj; = f;. 


sis vectors {eye and {f; 
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27.13 Following Example 27.2.6, show directly that Ci(R) ~ £(R’) = 
M22, 


27.14 Let x = (x1, x2, x3, x4) € £(R*). Define # : H@ H > L(R*) by 


o(p@qx=p-x-q*, p,qcH, xe L(R*) 


where on the right-hand side, x = xj + x2i + x37 + x4k is a quaternion. 
Show that ¢ is an algebra homomorphism, whose kernel is zero. Now invoke 
the dimension theorem and the fact that H @ H and £(R*) have the same 
dimension to show that ¢ is an isomorphism. 


27.15 Let V= Rj or V= RY and note that ee = 1 for Cy. Now use Theo- 
rem 27.2.17 to show that 


CyoroR) = Cy gro) @ Cy ro). 
27.16 Complete the remainder of Table 27.1. 


27.17 Using V = Rj or V= Ri derive formulas for cf ,4(R) and Ce (R) 
analogous to (27.50) and (27.51). 


27.18 Show that 


CR)=C@c(r'), Cme=c(R)@c(R"), 


CYR) =HeL(R"), CihR=ZL(R”). 


27.19 Show that if x* = 1 = y* and xy = yx, then the four quantities (1 =p 
x)(1 + y) are orthogonal idempotents. 


27.20 Verify all of the relations in Eq. (27.58). 
27.21 Derive Eq. (27.62). 
27.22 Note that 


€o = €91 = EgP) + EoP2 + €oP3 + EoP4 


€3 = 631 = @3P| + €3P2 + €3P3 + €3Py. 


Now use Table 27.3 to express each term on the right in terms of e;;. 
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Tensor algebra deals with lifeless vectors and tensors—objects that do not 
move, do not change, possess no dynamics. Whenever there is a need for 
tensors in physics, there is also a need to know the way these tensors change 
with position and time. Tensors that depend on position and time are called 
tensor fields and are the subject of this chapter. 

In studying the algebra of tensors, we learned that they are generaliza- 
tions of vectors. Once we have a vector space V and its dual space V*, we 
can take the tensor products of factors of V and V* and create tensors of var- 
ious kinds. Thus, once we know what a vector is, we can make up tensors 
from it. 

In our discussion of tensor algebra, we did not concern ourselves with 
what a vector was; we simply assumed that it existed. Because all the vectors 
considered there were stationary, their mere existence was enough. How- 
ever, in tensor analysis, where things keep changing from point to point 
(and over time), the existence of vectors at one point does not guarantee 
their existence at all points. Therefore, we now have to demand more from 
vectors than their mere existence. Tied to the concept of vectors is the notion 
of space, or space-time. Let us consider this first. 


28.1 Differentiable Manifolds 


Space is one of the undefinables in elementary physics. Length and time 
intervals are concepts that are “God given’, and any definitions of these 
concepts will be circular. This is true as long as we are confined within a 
single space. In classical physics, this space is the three-dimensional Eu- 
clidean space in which every motion takes place. In special relativity, space 
is changed to Minkowski space-time. In nonrelativistic quantum mechanics, 
the underlying space is the (infinite-dimensional) Hilbert space, and time is 
the only dynamical parameter. In the general theory of relativity, gravitation 
and space-time are intertwined through the concept of curvature. 
Mathematicians have invented a unifying theme that brings the common 
features of all spaces together. This unifying theme is the theory of differ- 
entiable manifolds. A rigorous understanding of differentiable manifolds is 
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beyond the scope of this book. However, a working knowledge of mani- 
fold theory is surprisingly simple. Let us begin with a crude definition of a 
differentiable manifold. 


Definition 28.1.1 A differentiable manifold is a collection of objects 
called points that are connected to each other in a smooth fashion such 
that the neighborhood of each point looks like the neighborhood of an m- 
dimensional (Cartesian) space; m is called the dimension of the manifold. 


As is customary in the literature, we use “manifold” to mean “differen- 
tiable manifold’. 


Example 28.1.2 The following are examples of differentiable manifolds. 


(a) The space R” is an n-dimensional manifold. 

(b) The surface of a sphere is a two-dimensional manifold. 

(c) A torus is a two-dimensional manifold. 

(d) The collection of all n x n real matrices whose elements are real func- 
tions having derivatives of all orders is an n?-dimensional manifold. 
Here a point is ann x n matrix. 

(e) The collection of all rotations in R? is a three-dimensional manifold. 
(Here a point is a rotation.) 

(f) Any smooth surface in R? is a two-dimensional manifold. 

(g) The unit n-sphere S”, which is the collection of points in R’*t! satis- 
fying 

xpte +x, =1, 
is a manifold. 


Any surface with sharp kinks, edges, or points cannot be a manifold. 
Thus, neither a cone nor a finite cylinder is a two-dimensional manifold. 
However, an infinitely long cylinder is a manifold. 


Let Up denote a neighborhood of P. When we say that this neighborhood 
looks like an m-dimensional Cartesian space, we mean that there exists a bi- 
jective map g : Up — R” from a neighborhood Up of P to a neighborhood 
g(Up) of g(P) in R”, such that as we move the point P continuously in 
Up, its image moves continuously in g(Up). Since g(P) € R”, we can de- 
fine functions x! : Up > R such that y(P) = (x!(P), x?(P),...,x’"(P)). 
These functions are called coordinate functions of yg. The numbers x! (P) 
are called coordinates of P. The neighborhood U>p together with its map- 
ping g form a chart, denoted by (Up, ¢). 

Now let (Vp, j2) be another chart at P with coordinate functions w(P) = 
(y!(P), y?(P),..., y"(P)) (see Fig. 28.1). It is assumed that the map jz o 
g-!:o(Upn Vp») > “(Up O Vp), which maps a subset of IR” to another 
subset of IR”, possesses derivatives of all orders. Then, we say that the two 
charts yw and gy are C%-related. Such a relation underlies the concept of 
smoothness in the definition of a manifold. A collection of charts that cover 
the manifold and of which each pair is C™-related is called a C™ atlas. 
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Fig. 28.1 Two charts (Up, g) and (Vp, 2), containing P are mapped into R”. The func- 
tion 40 gy! is an ordinary function from R” to R” 


Example 28.1.3 For the two-dimensional unit sphere S* we can construct 
an atlas as follows. Let P = (x, y, z) be a point in S*. Then x7 + y* +z? = 1, 


or 
z=t/1—x2—-y?. 


The plus sign corresponds to the upper hemisphere, and the minus sign to 
the lower hemisphere. Let Us be the upper hemisphere with the equator 
removed. Then a chart (zs , 93) with @3 : Us —> R? can be constructed 
by projecting on the xy-plane: g3(P) = (x, y). Similarly, (U3, 43) with 
3:U, > R?2 given by 43(P) = (x, y) is a chart for the lower hemisphere. 

In manifold theory the neighborhoods on which mappings of charts are 
defined have no boundaries (thus the word “open’’). This is because it is 
more convenient to define limits on boundaryless (open) neighborhoods. 
Thus, in the above two charts the equator, which is the boundary for both 
hemispheres, must be excluded. With this exclusion Us and U, cannot 
cover the entire S 2. hence, they do not form an atlas. More charts are needed 
to cover the unit two-sphere. Two such charts are the right and left hemi- 
spheres us and U, , for which y > 0 and y < 0, respectively. However, 
these two neighborhoods leave two points uncovered, the points (1, 0, 0) 
and (—1,0, 0). Again this is because boundaries of the right and left hemi- 
spheres must be excluded. Adding the front and back hemispheres U;~ to 
the collection covers these two points. Then S* is completely covered and 
we have an atlas. There is, of course, a lot of overlap among charts. We now 
show that these overlaps are C°-related. 

As an illustration, we consider the overlap between Us and ue . This is 
the upper-right quarter of the sphere. Let (Us , 93) and (Ox , 2) be charts 
with 


g3(x, y,z) = (x, y), g2(x, y, Z) = (x, Z). 


The inverses are therefore given by 


93 y= y.2) =(x,y, 1-2? -y?), 
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Fig. 28.2 A chart mapping points of S? into R?. Note that the map is not defined for 
@ =0, x, and therefore at least one more chart is required to cover the whole sphere 


Gy! (,2) = (x,y,z) = (x, V1 — x? - 2,2), 


and 


920 93 ' (x,y) = g2(x, y, V1 — x2 — y?) = (x, /1— x? — y?). 


Let us denote 92 0 g3 : by F, so that F : R? — R? is described by two 
functions, the components of F: 


Fi(x,y)=x and Py a4) baa ay. 


The first component has derivatives of all orders at all points. The second 
component has derivatives of all orders at all points except at x7 + y* = 1, 
which is excluded from the region of overlap of Uy and us , for which z 
can never be zero. Thus, F has derivatives of all orders at all points of its 
domain of definition. 

One can similarly show that all regions of overlap for all charts have this 
property, i.e., all charts are C~-related. 


Example 28.1.4 For S* of the preceding example, we can find a new atlas 
in terms of new coordinate functions. Since a + Xs + ae = 1, we can use 
spherical coordinates 0 = cos! x3, Qg= tan7! (x2/x1). A chart is then given 
by SS aiit= {—1}, 4), where w(P) = (6, g) maps a point of S? onto a 
region in R?. This is schematically shown in Fig. 28.2. The singletons {1} 
and {—1} are the north and the south poles, respectively. 

This chart cannot cover all of S?, however, because when 6 = 0 (or 71), 
the value of the azimuthal angle ¢ is not determined. In other words, 0 = 0 
(or 7) determines one point of the sphere (the north pole or the south pole), 
but its image in R* is the whole range of y values. Therefore, we must 
exclude 0 = 0 (or z) from the chart (S?, iL). To cover these two points, we 
need more charts. 


Example 28.1.5 A third atlas for S* is the so-called stereographic projec- 
tion shown in Fig. 28.3. In such a mapping the image of a point is obtained 
by drawing a line from the north pole to that point and extending it, if nec- 
essary, until it intersects the x;x2-plane. It can be verified that the mapping 
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fff . 
Fig. 28.3 Stereographic projection of S* into R. Note that the north pole has no image 
under this map; another chart is needed to cover the whole sphere 


gp: S* — {1} > R? is given by 


bal x2 
182.39) = ( ): 


1— x3’ 1— x3 


We see that this mapping fails for x3 = 1, that is, the north pole. Therefore, 
the north pole must be excluded (thus, the domain S? — {1}). To cover the 
north pole we need another stereographic projection—this time from the 
south pole. Then the two mappings will cover all of S*, and it can be shown 
that the two charts are C°°-related (see Example 28.1.12). 


The three foregoing examples illustrate the following fact, which can be 
shown to hold rigorously: 


Box 28.1.6 It is impossible to cover the whole S* with just one chart. 


Example 28.1.7 Let V be an m-dimensional real vector space. Fix any 
basis {e;} in V with dual basis {e'}. Define ¢: V > R” by ¢(v) = 
(e!(v),...,€”"(v)). Then the reader may verify that (V, #) is an atlas. Lin- 
earity of @ ensures that it has derivatives of all orders. This construction 
shows that V is a manifold of dimension m. 


If M and N are manifolds of dimensions m and n, respectively, we can 
construct their product manifold M x N, a manifold of dimension m + n. 
A typical chart on M x N is obtained from charts on M and WN as follows. 
Let (U, g) be achart on M and (V, 2) one on N. Then a chart on M x N is 
(U x V,@ Xx jt) where 


gx UP, Q)= (y(P), u(Q)) eR” x R"=R"*™" “for PeEU, OeV. 


Definition 28.1.8 Let M be a manifold. A subset N of M is called a sub- 
manifold of M if N is a manifold in its own right. 


A trivial, but important, example of submanifolds is the so-called open 
submanifold. If M is a manifold and U is an open subset! of M, then U 


‘Recall that an open subset U is one each of whose points is the center of an open ball 
lying entirely in U. 
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Fig. 28.4 Corresponding to every map f : M — N there exists a coordinate map 
po fog !:R" > R" 


inherits a manifold structure from M by taking any chart (Ug, gq.) and re- 
stricting gy, to U MN Ug. It is clear that dim U = dim M. 

Having gained familiarity with manifolds, it is now appropriate to con- 
sider maps between them that are compatible with their structure. 


Definition 28.1.9 Let M@ and N be manifolds of dimensions m and n, re- 
spectively. Let f : M — N be a map. We say that f is C™, or differen- 
tiable, if for every chart (U, yg) in M and every chart (V, jz) in N, the com- 
posite map xo f oy~! : R” — R", called the coordinate expression for f, 
is C© wherever it is defined.” 


The content of this definition is illustrated in Fig. 28.4. A particularly 
important special case occurs when N = R; then we call f a (real-valued) 
function. The collection of all C° functions at a point P € M is denoted 
by F©(P): If f € F©(P), then f : Up > Ris C™ for some neighborhood 
Up of P. 

Let f : M— N bea differentiable map. Then f is automatically contin- 
uous. Now let V be an open subset of N. The set f~!(V) is an open subset 
of M by Proposition 17.4.6.° 


Proposition 28.1.10 Let M be an m-dimensional manifold, f : M — N a 
differentiable map, and V an open subset of N. Then f~'(V), the set of 
points of M mapped onto V, is an open m-dimensional submanifold of M. 


Just as the concept of isomorphism identified all vector spaces, algebras, 
and groups that were equivalent to one another, it is desirable to introduce a 
notion that brings together those manifolds that “look alike”. 


?The domain of wo f og~! is not all of R’, but only its open subset p(U). However, we 


shall continue to abuse the notation and write R” instead of g(U). This way, we do not 
have to constantly change the domain as U changes. The domain is always clear from the 
context. 


3 Although Proposition 17.4.6 was shown for normed linear spaces, it really holds for all 
“spaces” for which the concept of open set is defined. 


28.1 Differentiable Manifolds 

Definition 28.1.11 A bijective differentiable map whose inverse is also dif- 
ferentiable is called a diffeomorphism. Two manifolds between which a dif- 
feomorphism exists are called diffeomorphic. Let M@ and N be manifolds. 
M is said to be diffeomorphic to N at P € M if there is a neighborhood 
U of P and a diffeomorphism f : U — f(U). Then f is called a local 
diffeomorphism at P. 


In our discussion of groups, we saw that the set of linear isomorphisms 
of a vector space V onto itself forms a group GL(V). The set of diffeomor- 
phisms of a manifold M onto itself also forms a group, which is denoted by 
Diff(M). 


Example 28.1.12 The generalization of a sphere is the unit n-sphere, 
which is a subset of R”*! defined by 


Sa (eihcaeR [afte +224, = 1}. 


The stereographic projection defines an atlas for S” as follows. For all points 
of S” except (0,0,..., 1), the north pole, define the chart yg; : S$” — {1} = 
Ut > R" by 


O4(%1, tee »Xn41) 


Xx] x, 
= oe ) for(WiceteeD el: 
1 Xxn41 1—Xxn41 


To include the north pole, consider a second chart g_ : S” — {-l} =U > 
R” defined by 


g(x, by i »Xn41) 


xX] Xn for ( yeu 
= si Soay or (X1,...,X ; 
14+ Xxn41 1+ 2x41 ; jie 
Next, let us find the inverses of these maps. We find the inverse of 9+; 
that of g_ can be found similarly. Let & = x,/(1 — xn+1). Then one can 
readily show that 


we = 1 + Xn41 oe oe Zi= fi —1 
k=1 I= Xn+1 ar i +1 
and 
ee oe fori=1,2,...,n. 
1+ a1 & 


From the definition of +, we have 


9,1, 20 En) = (X1,---5 Xn, Xn41) 


-( 261 26n dint") 
1+ Diese 14+ Dee Dee E+) 
(28.1) 
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On the overlap of U* and U~, i.e., on all points of S” except the north 
and the south poles, g_ o gy, : R” — R" can be calculated by noting that 
g— has the following effect on a typical entry of Eq. (28.1): 


2; 
da ss xj = 141 & ae §j 
. 1 + Xn+1 1+ Via &-1 aan &? 
Yih & 41 
Therefore, 
-1 &} En 
9-09; Git =( — ) 
wre & yee 


It is clear that g_ o gy! has derivatives of all orders except possibly at a 
point for which &; = 0 for all i. But this would correspond to x,+1 = 1, 
which is excluded from the region of overlap. 


28.2 Curves and Tangent Vectors 


We noted above that functions are special cases of Definition 28.1.9. An- 
other special case occurs when M = R. This is important enough to warrant 
a separate definition. 


Definition 28.2.1 A differentiable curve in the manifold M is a C° map 
of an interval of R to M. 


This definition should be familiar from calculus, where M = R? and a 
curve is given by its parametric equation (f(t), fo(t), f3(t)), or simply by 
r(t). The point y(a) € M is called the initial point, and y (b) € M is called 
the final point of the curve y. A curve is closed if y(a) = y (dD). 

We are now ready to consider what a vector at a point is. All the familiar 
vectors in classical physics, such as displacement, velocity, momentum, and 
so forth, are based on the displacement vector. Let us see how we can gen- 
eralize such a vector so that it is compatible with the concept of a manifold. 

In R?, we define the displacement vector from P to Q as a directed 
straight line that starts at P and ends at Q. Furthermore, the direction of 
the vector remains the same if we connect P to any other final point on the 
line PQ located beyond Q. This is because R? is a flat space, a straight line 
is well-defined, and there is no ambiguity in the direction of the vector from 
P to Q. 

Things change, however, if we move to a two-dimensional spherical sur- 
face such as the globe. How do we define the straight line from New York 
to Beijing? There is no satisfactory definition of the word “straight” on a 
curved surface. Let us say that “straight” means shortest distance. Then our 
shortest path would lie on a great circle passing through New York and Bei- 
jing. Define the “direction” of the trip as the “straight” arrow, say 1 km in 
length, connecting our present position to the next point 1 km away. As we 
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move from New York to Beijing, going westward, the tip of the arrow keeps 
changing direction. Its direction in New York is slightly different from its 
direction in Chicago. In San Francisco the direction is changed even more, 
and by the time we reach Beijing, the tip of the arrow will be almost opposite 
to its original direction. 

The reason for such a changing arrow is, of course, the curvature of the 
manifold. We can minimize this curvature effect if we do not go too far from 
New York. If we stay close to New York, the surface of the earth appears 
flat, and we can draw arrows between points. The closer the two points, the 
better the approximation to flatness. Clearly, the concept of a vector is a 
local concept, and the process of constructing a vector is a limiting process. 

The limiting process in the globe example entailed the notions of “close- 
ness”. Such a notion requires the concept of distance, which is natural for 
a globe but not necessary for a general manifold. For most manifolds it is 
possible to define a metric that gives the “distance” between two points of 
the manifold. However, the concept of a vector is too general to require such 
an elaborate structure as a metric. The abstract usefulness of a metric is a re- 
sult of its real-valuedness: given two points P; and P2, the distance between 
them, d(P}, P2), is anonnegative real number. Thus, distances between dif- 
ferent points can be compared. 

We have already defined two concepts for manifolds (more basic than 
the concept of a metric) that together can replace the concept of a metric in 
defining a vector as a limit. These are the concepts of (real-valued) functions 
and curves. Let us see how functions and curves can replace metrics. 

Let y : [a,b] > M be acurve in the manifold M. Let P € M be a point 
of M that lies on y such that y(c) = P for some c € [a, b]. Let f € F°(P). 
Restrict f to the neighboring points of P that lie on y. Then the composite 
function f oy: R— Risa real-valued function on R. 

We can compare values of f o y for various real numbers close to c— 
as in calculus. If u € [a, b] denotes* the variable, then f o y(u) = f (y(u)) 
gives the value of f o y at various u’s. In particular, the difference A(f o 
y) = fv@) — f((c)) is a measure of how close the point y(u) € M is 
to P. Going one step further, we define 
dfoy)) _ 1 Fv¥@)—-fYO) 


du uwze | EOE u—c 


(28.2) 


the usual derivative of an ordinary function of one variable. However, this 
derivative depends on y and on the point P. The function f is merely a fest 
function. We could choose any other function to test how things change with 
movement along y. What is important is not which function we choose, but 
how the curve y causes it to change with movement along y away from P. 
This change is determined by the directional derivative along y at P, as 
given by (28.2). A directional derivative determines a tangent which, in turn, 
suggests a tangent vector. That is why the tangent vector at P along y is 
defined to be the directional derivative itself! 


4We usually use u or ¢ to denote the (real) argument of the map y : [a,b] > M. 
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The use of derivative as tangent vector may appear strange to the novice, 
especially physicists encountering it for the first time, but it has been familiar 
to mathematicians for a long time. It is hard for the beginner to imagine vec- 
tors being charged with the responsibility of measuring the rate of change of 
functions. It takes some mental adjustment to get used to this idea. The fol- 
lowing simple illustration may help with establishing the vector-derivative 
connection. 


Example 28.2.2 Let us take the familiar case of a plane and consider the 
vector a= d;@, +a y é y. What kind of a directional derivative can correspond 
to a? First we need a curve y : R > R? that is somehow associated with a. 
It is not hard to convince oneself that the most natural association is that of 
vectors to tangents. Thus, we seek a curve whose tangent is (parallel to) a. 
The easiest (but not the only) way is simply to take the straight line along 
a; that is, let y(u) = (ayu, ayu). The directional derivative at u = O for an 
arbitrary function f : R? > R is given by 


d(foy)| _,. fvw)-fYO) |. flacu,ayu) — fO,9) 
——___— = lim = lim : 
du u-0 40 u u—0 u 
(28.3) 
Taylor expansion in two dimensions yields 
0 0 
f(azu, dyn) = f 0,0) ayn et aye l See 
ax u=0 dy u=0 


Substituting in (28.3), we obtain 


d(foy) a, U(Of/0X)y=0 + ayu(Of/OyY)u=0 ++ °° 
du 


This clearly shows the connection between directional derivatives and vec- 
tors. In fact, the correspondences 0/dx <> €; and d/dy <> @, establish this 
connection very naturally. 

Note that the curve y chosen above is by no means unique. In fact, there 
are infinitely many curves that have the same tangent at u = 0 and give the 
same directional derivative. 


Since vectors are the same as derivatives, we expect them to have the 
properties shared by derivatives: 


Definition 28.2.3 Let M be a differentiable manifold. A tangent vector at 
P €M is an operator t : F°(P) — R such that for every f, g € F™(P) 
anda, BER 


1. tis linear: taf + 6g) =at(f) + Pt(g); 
2.  tsatisfies the derivation property: 


t(fg) = g(P)t(f) + f(P)t(g). 
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The operator t is an abstraction of the derivative operator. Note that t(/), 
g(P), f(P), and t(g) are all real numbers. 

The reader may easily check that if addition and scalar multiplication of 
tangent vectors are defined in an obvious way, the set of all tangent vectors at 
P €M becomes a vector space, called the tangent space at P and denoted 
by Jp(M). If U is an open subset of M (therefore, an open submanifold of 
M), then it is clear that 


Tp(U)=Tp(M) forall PeU. (28.4) 


Definition 28.2.3 was motivated by Eqs. (28.2) and (28.3). Let us go back- 
wards and see if (28.2) is indeed a tangent, that is, if it satisfies the two 
conditions of Definition 28.2.3. 


Proposition 28.2.4 Let y be a C™ curve in M such that y(c) = P. Define 
Y(c): F©(P) > R by 


GoO)\M=tfoy , Perr’). 
du 


u=C 


Then y(c) is a tangent vector at P called the vector tangent to y at c. 


Proof We have to show that the two conditions of Definition 28.2.3 are sat- 
isfied for f, g € F©(P) and a, B € R. The first condition is trivial. For the 
second condition, we use the product rule for ordinary differentiation as fol- 
lows: 


| & 


[(foy)(goy)| 


. d 
(¥(c)) (fg) = qe 


Qa 


u=c u u=c 


d 


= laren) 


=[(%O)(N] (vO) + f(vO)[FO)(E)] 
=[(%O)(A]g(P) + f(PI[(VO)()]. 


Note that in going from the first equality to the second, we used the fact that 
by definition, the product of two functions evaluated at a point is the product 
of the values of the two functions at that point. 


du 


d 
| oY)u=c + (f © Pune] 2 oy) 


‘| 


u=c 


Let us now consider a special curve and corresponding tangent vector 
that is of extreme importance in applications. Let gy = (x!,x7,...,x”") bea 
coordinate system at P, where x! : M — R is the ith coordinate function. 
Then ¢ is a bijective C° mapping from the manifold M into R’’. Its inverse, 
yg! : IR” > M, is also a C® mapping. Now, the ith coordinate of P is the 
real number u = x'(P). Suppose that all coordinates of P are held fixed 
except the ith one, which is allowed to vary with u describing this variation. 


Definition 28.2.5 Let (Up, y) be a chart at P € M. Then the curve y! : 
R — M, defined by 


YP OH=O RP) Pyne Oy ock"?)) 
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is called the ith coordinate curve through P. The tangent vector to this 
curve at P is denoted by 0;|p and is called the ith coordinate vector field 
at P. The collection of all vector fields at P is called a coordinate frame 
at P. The variable u is arbitrary in the sense that it can be replaced by any 
(good) function of u. 


Let c= x!(P). Then for f € F©(P), we have 


dlp) f = (P se ei 
(lp) fF =(¥i(O)(f) = pres 


u=C 
d = a . 
=7flv (Pisa Pew CP) P)) 
u=C 
af a 
a alp=—| , 28.5 
axi|, > P= 5,7, (28.5) 


where the last equality is a (natural) definition of the partial derivative of 
f with respect to the ith coordinate evaluated at the point P. This partial 
derivative is again a C™ function at P. We therefore have the following: 


Proposition 28.2.6 The coordinate frame {0;| p}/"_, at P is a set of opera- 
tors 0;(P): F©(P) > R given by 


0 
(ilp)f = ae 
P 
d F 
— ate ( ), wang CP), uw, 0 (P), vaagce PY) 
(28.6) 


Another common notation for 0f/x! is f; 


Example 28.2.7 Pick a point P = (sin@cos@g, sin@ sing, cos@) on the 
sphere S* in a chart (Up, w) given by j(sin@ cosg, sin@ sing, cos@) = 
(6, g). If 6 is kept constant and ¢ is allowed to vary over values given by u, 
then the coordinate curve associated with g is given by 


Vo(u) = uw '(6, u) = (sin@ cosu, sin@ sinu, cos@). 


AS u varies, Yy(u) describes a curve on S*. This curve is simply a circle of 
radius sin@. The tangent to this curve at any point is 0/0@, or simply dg, the 
derivative with respect to the coordinate g. 

Similarly, the curve yg(u) describes a great circle on S? with tangent 
dg = 0/00. 


The vector space Jp(M) of all tangents at P was mentioned earlier. In 
the case of S? this tangent space is simply a plane tangent to the sphere at a 
point. Also, the two vectors, dg and dg encountered in Example 28.2.7 are 
clearly linearly independent. Thus, they form a basis for the tangent plane. 
This argument can be generalized to any manifold. The following theorem 
is such a generalization (for a proof, see [Bish 80, pp. 51-53]): 


28.2 Curves and Tangent Vectors 


Theorem 28.2.8 Let M be an m-dimensional manifold and P € M. Then 
the set {0;|p}""_, forms a basis of Jp(M). In particular, Jp(M) is m- 
dimensional. An arbitrary vector, t € Jp(M), can be written as 


t=0') |p, Where ai = t(x"): 

The last statement can be derived by letting both sides operate on x/ and 
using Eq. (28.6). Let M = V, a vector space. Choose a basis {e;} in V with 
its dual considered as coordinate functions. Then, at every v € V, there is a 
natural isomorphism ¢ : V > Jy(V) mapping a vector u = ae; € V onto 
a! ily € Ty(V). The reader may verify that this isomorphism is coordinate 
independent; 1.e., if one chooses any other basis of V with its corresponding 
dual, then @(v) will be the same vector as before, expressed in the new 
coordinate basis. Thus, 


Box 28.2.9 If V is a vector space, then for all v € V, one can identify 
Ty(V) with V itself. 


Suppose we have two coordinate systems at P, {x'} with tangents ;|p 
and {y/} with tangents Vjlp. Any t € Jp(M) can be expressed either in 
terms of 0;|p or in terms of V;|p: t= at! 0; |p = B/Vjlp. We can use this 
relation to obtain a’ in terms of B/: From Theorem 28.2.8, we have 


n= pit! 
le )=B ays 


In particular, if t= V;|p, then B/ = t(y/) = [Vel p](/) = 4/, and (28.7) 
gives a! = dx! /dy*. Thus, using Eq. (28.5), 


a) 


: 28.7 
a (28.7) 


a! = t(x') = (B/V;| p)(x') = G 


P 


ox! a 
p dys dx! 


0 


ae 28.8 
Sf (28.8) 


. 
For any function f € F°(P), Eq. (28.8) yields 
a of a ax! 
tl b-SL-SLLe)-% 
y! |p dy/ pLox! |p dys 


This is the chain rule for differentiation. 


ax! 


p dy/ 


af 


p dx! 


P 


Example 28.2.10 Let us find the coordinate curves and the coordinate 
frame at P = (x, y,z) on S?. We use the coordinates of Example 28.1.3. 
In particular, consider g3, whose inverse is given by 


93 (x,y) =(x,y, 1-2? — y?). 
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The coordinate curve (wu) along y is obtained by letting y be a function® 


of u: 
y2(u) = 93 Mabe h(u)) = (x, h(u), 1 — x? — h?(u)), 


where h(0) = y and h’(0) = @, aconstant. To find the coordinate vector field 
at P, let f ¢ F©(P), and note that 


= afen h(u), V1 —x? —h?(u)) - 


f= d 
of = af (2) 


_ of dh 
~ ay du |, 


Ze zZ0z 


So, choosing the function / in such a way that a = 1, 


Oo = dy — ae, 


where 0, and 0, are the coordinate vector fields of R>. The coordinate vector 
field 0; can be obtained similarly. 


28.3 Differential of aMap 


Now that we have constructed tangent spaces and defined bases for them, 
we are ready to consider the notion of the differential (derivative) of a map 
between manifolds. 


Definition 28.3.1 Let M and N be manifolds of dimensions m and n, 
respectively, and let Ww: M — N be a C® map. Let P € M, and let 
Q=w/(P)€N be the image of P. Then there is induced a map Wxp : 
Tp(M) > To(N), called the differential of y at P and given as follows. 
Let te TJp(M) and f € F*(Q). The action of w.p(t) € To(N) on f is 
defined as 


(WsP(t))(f) =t(f ow). (28.9) 


The reader may check that the differential of a composite map is the 
composite of the corresponding differentials, i.e., 


(Wo b)xp = Wao(P) 2 PxP- (28.10) 


Furthermore, if 7 is a local diffeomorphism at P, then yy, p is a vector space 
isomorphism. The inverse of this statement—which is called the inverse 
mapping theorem, and is much harder to prove (see [Abra 88, pp. 116 and 
196])—is also true: 


5See the last statement of Definition 28.2.5. 


28.3 Differential of a Map 


Theorem 28.3.2 (Inverse mapping theorem) Jf &¥:M — N is a map and 
Wxp :Tp(M) > Typ) (N) is a vector space isomorphism, then w is a local 
diffeomorphism at P.. 


Let us see how Eq. (28.9) looks in terms of coordinate functions. Suppose 
that {x! }_, are coordinates at P and {y“}"_, are coordinates at Q = w(P). 
We note that y“ o y is a real-valued C® function on M. Thus, we may write 
(with the function expressed in terms of coordinates) 


gy capa f(x yacesk”)) 


We also have t = ad;|p. Similarly, y,p(t) = 64 (d/dy")|Q because 
{(0/dy*)| g} form a basis. Theorem 28.2.8 and Definition 28.3.1 now give 


P=heOO")=thy’* ov) St") 


. oft oft 
= ‘9. a — | a : = I : . 
[a ilp](f ) a ax! ‘ a" oxi g 
This can be written in matrix form as 
B' Of fax OF fax” wc. OF [OR fa’ 
Bp? Of jae” Of Ox” sve AF fOR™ | | a 
. = . ; : (28.11) 
p" af" jax! af" /ax* a af" /ax™ qi” 


The n x m matrix is denoted by J and is called the Jacobian matrix of 
with respect to the coordinates x! and y“. On numerous occasions the two 
manifolds are simply Cartesian spaces, so that w : R” — R”. In sucha case, 
f® is naturally written as w%, and the Jacobian matrix will have elements 
of the form dy% /dx!. 

An important special case of the differential of a map is that of a constant 
map. Let 7 : M — {Q} € N be such a map; it maps all points of M onto a 
single point Q of N. For any f €¢ F©(Q), the function f oy € F©(P) is 
constant for all P € M. Let t€ Jp(M) be an arbitrary vector. Then 


(Vie®)(P=st(foy)=0 Vf>v.p(t)=0 vt (28.12) 


because t(c) = 0 for any constant c. So, 


Box 28.3.3 If  : M — {Q} € N is a constant map, so that it maps 
the entire manifold M onto a point Q of N, then Wxp : Tp(M) > 
Jo(N) is the zero map. 


Two other special cases merit closer attention: M = R for arbitrary N, 
and N =R for arbitrary M. In either case J.(R) is one-dimensional with 
the basis vector (d/du)|-. When M = R, the mapping becomes a curve, y : 
R — N. The only vector whose image we are interested in is t= (d/du)|c, 
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with y(c) = P. From (28.9) using Proposition 28.2.4 in the last step, we 
have 


= (y(c))(f). 


u=c 


d f d f 
— = — fe} 
jis du |. du’ i 
This tells us that the differential of a curve at c is simply its tangent vector 


at y(c). It is common to leave out the constant vector (d/du)|-, and write 
Yxc for the LHS. 


Example 28.3.4 It is useful to have an expression for the components of 
the tangent to a curve y at an arbitrary point on it. Since y maps the real 
line to M, with a coordinate patch established on M, we can write y as 
y =(y!,...,y™) where y! =x! oy are ordinary functions of one variable. 
Proposition 28.2.4 then yields 


d 


d d 
Mei =—fer) = —fy@)|) = Rf WO.a"@ 
7 du u=t du ( ) u=t du ) u=t 
af dy! af dy’. 
S| EE gg 
Ox! du |, ox! dt 
or 
. . dy! 
Yer = yid;, where y! = + (28.13) 


For this reason, /,; is sometimes denoted by y. 


When N = R, we are dealing with a real-valued function f : M—> 
R. The differential of f at P is fxp : Jp(M) > J-(R), where c = 
f(P). Since J.(R) is one-dimensional, for a tangent t € Tp(M), we 
have f,p(t) = a(d/du)|-. Let g: R— R be an arbitrary function on R. 
Then [ fx p(t)](g) = a(dg/du)-, or, by definition of the LHS, t(g o f) = 
a(dg/du),. To find a we choose the function g(u) = u, i.e., the iden- 
tity function; then dg/du = 1 and t(g o f) = t(f) =a. We thus obtain 
fxr (t) =t(f)(d/du)|-. Since J,(R) is a flat one-dimensional vector space, 
all vectors are the same and there is no need to write (d/du)|-. Thus, we de- 
fine the differential of f, denoted by df = f,, asamap df :TJp(M)>~R 
given by 


df (t) =t(f). (28.14) 


In particular, if f is the coordinate function x! and t is the tangent to the 
jth coordinate curve 0;|p, we obtain 


dx'| p(8j\P) = [djlPl(x') =o. (28.15) 


This shows that 


Box 28.3.5 {dx! |p}7_, is dual to the basis {0; [Pyar of Tp(M). 


28.3 Differential of a Map 


Example 28.3.6 Let f : M — R be areal-valued function on M. Let x! be 
coordinates at P. We want to express df in terms of coordinate functions. 
For t € Tp(M) we can write t = «3; |p and 


df(t) =t(f) =a [di|Pl(f) =a! di(f), 


where in the last step, we suppressed the P. Theorem 28.2.8 and Eq. (28.14) 
yield a! = t(x') = (dx')(t). We thus have 


df (t) = d:(f)[ (dx')(®)] = [a:(f)(dx') J. 
Since this is true for all t, we get 


m 


df =3;(f)(dx') = 28K dxi = Lape (28.16) 


This is the classical formula for the differential of a function f. If we choose 
y/, the jth member of a new coordinate system, for f, we obtain 


dyi . Oys 
dyi =~ aa ax" = ar ax, (28.17) 


which is the transformation dual to Eq. (28.8). 


Consider a map @ from the product manifold M x N to another mani- 
fold L. Then 


ds : Tp(M) x To(N) => To P,Q) (L). 
We want to find ¢,(t,s) for t<¢ Jp(M) and s € Jg(N). First define the 
maps ¢9: M > L and dp: N > L by $Q(P) = @¢(P, Q) and @p(Q) = 
o(P, Q). Then 
box: Tp(M) > Tocp,g)(L) and dpx:TQ(N) > Typ,gy(L). 


Now let a(t) and (ft) be the tangent curves associated with t and s passing 
through P and Q, respectively. Let f ¢ F°(P, Q). Then, 


d 
ut A) = FILS 0 d)(at), BO), 0 
d d 
= FLY 2 P(al#), BO)], 0 + FL 2 6)(#), BO)],-0 


d d 
= rrles 0 p)(a(t), Q)], 9 + alt 0 $)(P, B(t))],_o 


where the second line follows from the chain rule (or partial derivatives) and 
the third line from the fact that a passes through P and f through Q. From 
the definitions of dp and ¢g, we can rewrite the last line as 


d d 
ox (t, s)(f) = eres 0 bo)(a(t))],-9 + a opp) (B(t)) |,_9 
= hox() f + bpx(s) f. 
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We thus have the following: 


Proposition 28.3.7 The differential of 6: M x N —> L at (P, Q) is amap 
gx :Tp(M) x To(N) > Tocp,a)(L) given by $.(t, 8) = box (t) + bpx(S), 
where 6g: M > L and ¢p: N — L are defined by 6g (P) = ¢(P, Q) = 
pp(Q). 


The following is a powerful theorem that constructs a submanifold out of 
a differentiable map (for a proof, see [Warn 83, p. 31]): 


Theorem 28.3.8 Assume that y:M — N is a C® map, that Q is a point 
in the range of w, and that y, :Tp(M) > TQ(N) is surjective for all P € 
w—!(Q). Then w—!(Q) is a submanifold of M and dimy~!(Q) = dim M — 
dim N. 


Compare this theorem with Proposition 28.1.10. There, V was an open 
subset of N, and since f—!(V) is open, it is automatically an open subman- 
ifold. The difficulty in proving Theorem 28.3.8 lies in the fact that y—!(Q) 
is closed because { Q}, a single point of N, is closed. 

We can justify the last statement of the theorem as follows. From 
Eq. (28.12), we readily conclude that Tre(y! (Q)) = ker w, p. The dimen- 
sion theorem, applied to wp : Jp(M) > To(N), now gives 


dim J p(M) = dimker w,.p + rank y,, p 
=> dimM=dimy '(Q)+dimN, 


where the last equality follows from the surjectivity of W.p. 


Example 28.3.9 Consider a C° map f : R” > R. Let c € R such that the 
partial derivatives of f are defined and not all zero for all points of f~!(c). 
Then, according to Eq. (28.11), a vector a! 4; € Tp(R") is mapped by f, to 
the vector a! (Af/dx') pocd /dt. Since 9f/dx! are not all zero, by properly 
choosing a, we can make a! (Of /Ax') acd /dt sweep over all real numbers. 
Therefore, f, is surjective, and by Theorem 28.3.8, f~!(c) is an (n — 1)- 
dimensional submanifold of R”. A noteworthy special case is the function 
defined by 


f(x! 22,....2") = (x!) + 2) 4-42") 


and c=r? > 0. Then, ft “(e), an (n — 1)-sphere of radius r, is a submani- 
fold of R”. 


28.4 Tensor Fields on Manifolds 


So far we have studied vector spaces, learned how to construct tensors out 
of vectors, touched on manifolds (the abstraction of spaces), seen how to 
construct vectors at a single point in a manifold by the use of the tangent- 
at-a-curve idea, and even found the dual vectors dx! |p to the coordinate 


28.4 Tensor Fields on Manifolds 


vectors 0;|p at a point P of a manifold. We have everything we need to 
study the analysis of tensors. 


28.4.1 Vector Fields 


We are familiar with the concept of a vector field in 3D: Electric field, mag- 
netic field, gravitational field, velocity field, and so forth are all familiar 
notions. We now want to generalize the concept so that it is applicable to a 
general manifold. To begin with, let us consider the following definition. 


Definition 28.4.1 The union of all tangent spaces at different points of a 
manifold M is denoted by T(M) and called the tangent bundle of M: 


T(M)= |) Tr(M) 


PEM 


It can be shown ([Bish 80, pp. 158—164]) that T(M) is a manifold of 
dimension 2dim M. 


Definition 28.4.2 A vector field X on a subset U of a manifold M is a 
mapping X : U > T(M) such that X(P) = X|p = Xp € Tp(M). The set 
of vector fields on M is denoted by X(M). Let M and N be manifolds 
and F : M — N a differentiable map. We say that the two vector fields 
X € X(M) and Y € X(N) are F-related if F,.(Xp) = Yr py for all Pe M. 
This is sometimes written simply as F,.X = Y. 


It is worthwhile to point out that F,.X is not, in general, a vector field 
on N. To be a vector field, F.X must be defined at all points of NV. The 
natural way to define F,.X at Q € N is [F,X(Q)](f) = X(f o F)(P) where 
P is the preimage of Q,i.e., F(P) = Q. But there may not exist any such P 
(F may not be onto), or there may be more than one P (F may not be one- 
to-one) with such property. Therefore, this natural construction does not lead 
to a vector field on N. If F,.X happens to be a vector field on N, then it is 
clearly F-related to X. In terms of the coordinates x’, at each point P € M, 


Xp =X|p =Xdj|p, 


where the real numbers X 7 are components of Xp in the basis {0;|p}. As P 
moves around in U, the real numbers X/, keep changing. Thus, we can think 
of Xp as a function of P and define the real-valued function X': M —> R 
by X'(P) = X’5. Therefore, the components of a vector field are real-valued 
functions on M. 


Example 28.4.3 Let M = R*. At each point P = (x,y,z) € R’, let 
(€,, €y, €,) be a basis for IR3. Let Vp be the vector space at P. Then T (IR?) 
is the collection of all vector spaces Vp for all P. 

We can determine the value of an electric field at a point in R? by first 
specifying the point, as Po = (x0, yo, Zo), for example. This uniquely de- 
termines the tangent space Jp, (IR*). Once we have the vector space, we 
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can ask what the components of the electric field are in that space. These 
components are given by three numbers: E(x, yo, Zo), Ey(%o, yo, Zo), and 
E~(xo, yo, Zo). The argument is the same for any other vector field. 

To specify a “point” in T (IR*), we need three numbers to determine the 
location in R? and another three numbers to determine the components of a 
vector field at that point. Thus, a “point” in 7 (IR?) is given by six “coordi- 
nates” (x, y,z, Ex, Ey, Ez), and T(R?) 1s a six-dimensional manifold. 


We know how a tangent vector t at a point P € M acts on a function 
f € F®(P) to give a real number t(f). We can extend this, point by point, 
for a vector field X and define a function X(f) by 


[X(f)|(P)=Xp(f), Pew, (28.18) 


where U is a subset of M on which both X and f are defined. The RHS is 
well-defined because we know how Xp, the vector at P, acts on functions 
at P to give the real number [Xp](f). On the LHS, we have X(f), which 
maps the point P onto a real number. Thus, X(f) is indeed a real-valued 
function on M. We can therefore define vector fields directly as operators 
on C® functions satisfying 


X(af + Bg) =aX(f) + BX(g), 


X(fg)=[X(f)]g +[X(@)] F- 


A prototypical vector field is the coordinate vector field 0;. In general, 
X(f) is not a C® function even if f is. A vector field that produces a C° 
function X(f) for every C° function f is called a C* vector field. Such a 
vector field has components that are C® functions on M. 

The set of tangent vectors Jp(M) at a point P € M form an m- 
dimensional vector space. The set of vector fields X(M)—which yield a 
vector at every point of the manifold—also constitutes a vector space. How- 
ever, this vector space is (uncountably) infinite-dimensional. 

A property of X(M) that is absent in Tp(M) is composition.° This sug- 
gests the possibility of defining a “product” on X(M) to turn it into an al- 
gebra. Let X and Y be vector fields. For X o Y to be a vector field, it has to 
satisfy the derivation property. But 


Xo ¥(fg) = X(¥(fg)) =X(¥(f)g + fY(g)) 
= (X(Y¥(f)))g + Y(f)X(g) + X(P)V(g) + f(X(¥(g))) 
#(XoY(f))g + f(Xo ¥(g)). 


However, the reader may verify that X o Y — Y o X does indeed satisfy the 
derivation property. Therefore, by defining the binary operation X(M) x 


Recall that a typical element of Tp(M) is a map t: F°(P) — R for which composition 
is meaningless. 


28.4 Tensor Fields on Manifolds 


X(M) + X(M) as 
[X, Y]=XoY—YoxX, 


X(M) becomes an algebra, called the Lie algebra of vector fields of M. The 
binary operation is called the Lie bracket. Although it was not mentioned at 
the time, we have encountered another example of a Lie algebra in Chap. 4, 
namely £(V) under the binary operation of the commutation relation. Lie 
brackets have the following two properties: 


[X, Y] = —LY, X], 


[(X, Y], Z] + [[Z, X], Y] + [[Y, Z], X] =0. 


These two relations are the defining properties of all Lie algebras. The last 
relation is called the Jacobi identity. X(/) with Lie brackets is an example 
of an infinite-dimensional Lie algebra; £(V) with commutators is an exam- 
ple of a finite-dimensional Lie algebra. 

We shall have occasion to use the following theorem in our treatment of 
Lie groups and algebras in Chap. 29: 


Theorem 28.4.4 Let M and N be manifolds and F : M > N a differen- 
tiable map. Assume that X; € X(M) is F-related to Y; € X(N) fori = 1,2. 
Then [X,, X2] is F-related to [Y1, Y2], i.e., 


F,[X1, Xo] =[F.X1, Xo]. 
Proof Let f be an arbitrary function on NV. Then 
(FxEX1, X21) f = [X1, X21(f o F) = Xi (X2(f 0 F)) — X2(Xi(f o F)) 
= X\([F.X2(f)] o F) — X2([F.-X1(f)] 0 F) 


= FX) (F..X2(f)) _ F,.X2(F,X1(f)) 
= [F.X1, FX] f, 


where we used Eq. (28.9) in the first, second, and third lines, and the result 
of Problem 28.8 in the second line. 


It is convenient to visualize vector fields as streamlines. In fact, most of 
the terminology used in three-dimensional vector analysis, such as flux, di- 
vergence, and curl, have their origins in the flow of fluids and the associated 
velocity vector fields. The streamlines are obtained—in nonturbulent flow— 
by starting at one point and drawing a curve whose tangent at all points is 
the velocity vector field. For a smooth flow this curve is unique. There is an 
exact analogy in manifold theory. 


Definition 28.4.5 Let X € X(M) be defined on an open subset U of M. 
An integral curve of X in U is a curve y whose range lies in U and for 
every ¢ in the domain of y, the vector tangent to y satisfies yx; = X(y(t)). 
If y(O) = P, we say that y starts at P. 
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Let us choose a coordinate system on M. Then X = X 10;, where X! 
are C© functions on M, and, by (28.13), y%. = y'd;. The equation for the 
integral curve of X will therefore become 


via; =X'(v(o)a;, or oY = x'(y!(,.... 70), i=1,2,...,m. 


Since y! are simply coordinates of points on M, we rewrite the equation 
above as 

dx! . 

Fm E (HO 20), i=1,2,...,m. (28.19) 
This is a system of first-order differential equations that has a unique (lo- 
cal) solution once the initial value y (0) of the curve, i.e., the coordinates 
of the starting point P, is given. The precise statement for existence and 
uniqueness of integral curves is contained in the following theorem. 


Theorem 28.4.6 Let X be a C™ vector field defined on an open subset U 
of M. Suppose P € U, and c € R. Then there is a positive number € and a 
unique integral curve y of X defined on |t — c| < € such that y(c) = P 


Example 28.4.7 (Examples of integral curves) 


(a) Let M=R with coordinate function x. The vector field X = x0, has 
an integral curve with initial point xo given by the DE dx/dt = x(t), 
which has the solution x(t) = e’ xo. 

(b) Let M =R" with coordinate functions x!. The vector field X = 
><a‘ d; has an integral curve, with initial point Yo. given by the sys- 
tem of DEs dx’ /dt = a', which has the solution x ‘n= =a't+ Xo> or 
r=at+ro. The curve is therefore a straight line parallel to a going 
through ro. 

(c) Let M=R?’ with coordinate functions x!. Consider the vector field 


n 
x= > a',x! dj. 


ij=l 


The integral curve of this vector field, with initial point ro, is given 
by the system of DEs dx! /dt = Yi aix/, which can be written in 
vector form as dr/dt = Ar where A is a constant matrix. By differ- 
entiating this equation several times, one can convince oneself that 


d‘/dt* = A‘r. The Taylor expansion of r(t) then yields 
[o,@) 
1 d‘r 
n=) ——— 
r= 2) ait 
k=0 


(d) Let M =R? with coordinate x, y. The reader may verify that the vec- 
tor field X = —yd, + xd, has an integral curve through (xo, yo) given 
by 


[ee] 


k 
t 
t=) A AK 
a mm ro =e! Arg. 


k=0 


x =xgcost — yosint, 
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y= xo sint + yocost, 
i.e., a circle centered at the origin passing through (xo, yo). 


Going back to the velocity vector field analogy, we can think of integral 
curves as the path of particles flowing with the fluid. If we think of the 
entire fluid as a manifold M, the flow of particles can be thought of as a 
transformation of M. To be precise, let M be an arbitrary manifold, and 
X € X(M). At each point P of M, there is a unique local integral curve 
yp of X starting at P defined on an open subset U of M. The map F; : 
U — M defined by F;(P) = yp(t) is a (local) transformation of M. The 
collection of such maps with different t’s is called the flow of the vector 
field X. The uniqueness of the integral curve yp implies that F; is a local 
diffeomorphism. In fact, the collection of maps { F;};<R forms a (local) one- flow of a vector field 
parameter group of transformations in the sense that 


F, 0 Fy = Fits, Fo = id, (F,)7! = Fy. (28.20) 


One has to keep in mind that F; at a point P € M is, in general, defined only Global 1-parameter 
locally in t, i.e., only for t in some open interval that depends on P. For group of 
some special, but important, cases this interval can be taken to be the entire transformations; 
R for all P, in which case we speak of a global one-parameter group of complete vector fields 
transformations, and X is called a complete vector field on M. 

The symbol F; used for the flow of the vector field X does not contain its 
connection to X. In order to make this connection, it is common to define 


F, =exp(tX). (28.21) 


This definition, with no significance attached to “exp” at this point, converts 
Eq. (28.20) into 


exp(tX) o exp(sX) = exp| (¢ + s)X], 
exp(OX) = id, (28.22) 
[exp(tX)] | = exp(—fX), 
which notationally justifies the use of “exp”. We shall see in our discussion 


of Lie groups that this choice of notation is not accidental. 
Using this notation, we can write 


d 


a rel o exp(tX) 


d 
x= CaN) a 


t=0 t=0 


One usually leaves out the function f and writes 


d 
Xp= 


= 7 expX)) (28.23) 


t=0 
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where it is understood that the LHS acts on some f that must compose on 
the RHS to the left of the exponential. Similarly, we have 


d 
(FX) Fp) = Ae emt) 


’ 


t=0 


(28.24) 


d 
GF(P) (Frew 
t=0 


d 
=—GoF tX 
_) 7 Ta (exptX) 


=F,(X) 


where F : M — N andG: N — K are maps between manifolds. 


Example 28.4.8 In this example, we derive a useful formula that gives the 
value of a function at a neighboring point of P € M located on the integral 
curve of X € X(M) going through P. We first note that since Xp is tangent 
to yp at P = y(0), by Proposition 28.2.4 we have 


d d 
Xp(f)= ae (ve) 7 af (h(P)) 


t=0 d t=0 


Next we use the definition of derivative and the fact that Fo(P) = P to write 


1h F(R) — FP) = Xf). 


lim 

t-0 ¢t 
Now, if we assume that ¢ is very small, we have 

f(Fi(P)) = f(P) + t&e(f) +--+, (28.25) 


which is a Taylor series with only the first two terms kept. 


28.4.2 Tensor Fields 


We have defined vector spaces Jp(M) at each point of M. We have also 
constructed coordinate bases, {0;|p}/"_,, for these vector spaces. At the end 
of Sect. 28.2, we showed that the differentials {dx'|p}?"_, form a basis that 
is dual to {0;| Py . Let us concentrate on this dual space, which we will 
denote by 73, (M). 

Taking the union of all J; (M) at all points of M, we obtain the cotan- 
gent bundle of M: 


T*(M) = |_) Tp(M). (28.26) 
PeM 


This is the dual space of T(M) at each point of M. We can now define the 
analogue of the vector field for the cotangent bundle. 


Definition 28.4.9 A differential one-form 0 on a subset U of a manifold 
M is a mapping 8 : U + T*(M) such that 0(P) =@p € J;,(M). The col- 
lection of all one-forms on M is denoted by X*(M). 
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If 8 is a one-form and X is a vector field on M, then 0(X) is a real-valued 
function on M defined naturally by [0(X)](P) = @p)(Xp). The first factor 
on the RHS is a linear functional at P, and the second factor is a vector at 
P. So, the pairing of the two factors produces a real number. A prototypical 
one-form is the coordinate differential, dx'. 

Associated with a differentiable map y : M — N, we defined a differen- 
tial yy, that mapped a tangent space of M to a tangent space of N. The dual 
of wy, (Definition 2.5.4) is denoted by y* and is called the pullback of yw. 
It takes a one-form on N to a one-form on M. In complete analogy to the 
case of vector fields, 8 can be written in terms of the basis {dx! }:6=6; dx’. 
Here 6;, the components of 0, are real-valued functions on M. 

With the vector spaces Jp(M) and Tp (M) at our disposal, we can con- 
struct various kinds of tensors at each point P. The union of all these tensors 
is a manifold, and a tensor field can be defined as usual. Thus, we have the 
following definition. 


Definition 28.4.10 Let Jp(M) and J;(M) be the tangent and cotangent 
spaces at P € M. Then the set of tensors of type (7,5) on Jp(M) is de- 
noted by J; »(M). The bundle of tensors of type (r, s) over M, denoted by 
T,; (M), is 


T!(M) = |_J Ty pM). 
PEM 


A tensor field T of type (7, s) over a subset U of M is a mapping T: U > 
Tj (M) such that T(P) =Tp =T\|p € TJ" p(M). 


In particular, TQ(M ) is the set of real-valued functions on M, T. (M) = 
T(M), and T?(M ) = T*(M). Furthermore, since T is a multilinear map, 
the parentheses are normally reserved for vectors and their duals, and as 
indicated in Definition 28.4.10, the value of T at P € M is written as Tp or 
T|p. The reader may check that the map 


T: X*(M) x «+» x X*(M) x X(M) x --- x X(M) > T)(M) 
=-_--- o's sc on” 
r times s times 

defined by 

[T(w',...,.@",v1,...,Vs)|(P)=Tp(o!'|p,...,0" |p, vilp,....VslP) 
has the property that 

T(..., fo! + 90/,...)= fT(...,0/,...)+9T(...,0/,...), 

TC... fv + gug,...) = fTC..., VK...) + gT(...,Uz,...) 


(28.27) 


for any two functions f and g on M. Thus,’ 


7In mathematical jargon, X(M) and X*(M) are called modules over the (ring of) real- 
valued functions on M. Rings are a generalization of the real numbers (field of real num- 
bers) whose elements have all the properties of a field except that they may have no 
inverse. A module over a field is a vector space. 
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Box 28.4.11 A tensor is linear in vector fields and 1\-forms, even 
when the coefficients of linear expansion are functions. 


The components of T with respect to coordinates x! are the m"*+® real- 

valued functions 
ane Sade snc yh 5 Dies es 5 07), 

If tensor fields are to be of any use, we must be able to differentiate them. 
We shall consider three types of derivatives with different applications. We 
study one of them here, another in the next section, and the third in Chap. 36. 

Derivatives can be defined only for objects that can be added. For func- 
tions of a single (real or complex) variable, this is done almost subcon- 
sciously: We take the difference between the values of the function at two 
nearby points and divide by the length of the interval between the two 
points. We extended this definition to operators in Chap. 4 with practically 
no change. For functions of more than one variable, one chooses a direction 
(a vector) and considers change in the function along that direction. This 
leads to the concept of directional derivative, or partial derivative when 
the vector happens to be along one of the axes. 

In all the above cases, the objects being differentiated reside in the same 
space: f(t) and f(t + At) are both real (complex) numbers; H(t) and 
H(t + Afr) both belong to £(V). When we try to define derivatives of ten- 
sor fields, however, we run immediately into trouble: Tp and Tp, cannot 
be compared because they belong to two different spaces, one to JS. p(M) 
and the other to w p/(M). To make comparisons, we need first to establish 
a “connection” between the two spaces. This connection has to be a vec- 
tor space isomorphism so that there is one and only one vector in the sec- 
ond space that is to be compared with a given vector in the first space. The 
problem is that there are infinitely many isomorphisms between any given 
two vector spaces. No “natural” isomorphism exists between a. p(M) and 
JF p/(M); thus the diversity of tensor “derivatives!” We narrow down this 
diversity by choosing a specific vector at ‘J, »(M) and seeking a natural 
way of defining the derivative along that vector by associating a “natural” 
isomorphism corresponding to the vector. There are a few methods of doing 
this. We describe one of them here. 

First, let us see what happens to tensor fields under a diffeomorphism of 
M onto itself. Let F : M — M be such a diffeomorphism. The differential 
Fp of this diffeomorphism is an isomorphism of Jp(M) and Jpr,p)(M). 
This isomorphism induces an isomorphism of the vector spaces de p(M) 
and Ji. F(p)(M)—also denoted by F,,p—by Eq. (26.10). Let us denote by 
F,,a map of T(M) onto T(M) whose restriction to Jp(M) is Fy p. If Tis a 
tensor field on M, then F,(T) is also a tensor field, whose value at F(Q) is 
obtained by letting Fg act on T(Q): 


[F.(T)](F(Q)) = Fxo(T(Q)), 
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or, letting P = F(Q) or O= F7|(P), 


[Fu ](P) = Fyp-1(py(T(F1(P))). (28.28) 


Now, let X be a vector field and P € M. The flow of X at P defines a 
local diffeomorphism F,; : U > F;(U) with P € U. The differential F;, of 
this diffeomorphism is an isomorphism of Jp(M) and Tp,(p)(M). As dis- 
cussed above, this isomorphism induces an isomorphism of the vector space 
Jj, p(M) onto itself. The derivative we are after is defined by comparing a 
tensor field evaluated at P with the image of the same tensor field under 
the isomorphism F,,). The following definition makes this procedure more 
precise. 


Definition 28.4.12 Let P e M, X € X(M), and F; the flow of X defined 
in a neighborhood of P. The Lie derivative of a tensor field T at P with 
respect to X is denoted by (LxT) p and defined by 


oa 
—F,'Trpy|  . (28.29) 


1 1 
LxT)p = lim -| F,,. T Tel|= 
(LxT)p as ea tx 'F,(P) P| i a 


Let us calculate the derivative in Eq. (28.29) at an arbitrary value of f. 
For this purpose, let Q = F;(P). Then 


d =] _ < 1 —| —l1 
aris Pacey = him, [Fit aneb rsa?) — Fee TH) 


1 
=] qe | 
=e Bec Ay Pate Te ar() —Tr Py] 
=F! li 1 FO'T T = F-! LyT 
Be jim al Atx * Far(Q) — o|= ix (LxT)o. 


Since Q is arbitrary, we can remove it from the equation and write, as the 
generalization of Eq. (28.29), 


d 
LxT = Fix afm h (28.30) 


An important special case of the definition above is the Lie derivative of a 
vector field with respect to another. Let X, Y € X(M). To evaluate the RHS 
of (28.29), we apply the first term in the brackets to an arbitrary function f, 


[Fn Yr] =Yrw (Sok )=Y¥(foF lacey 
=Y(fo Fp) pp =YV(f- XP) pcp) 
= (Vf) rey —t¥(X(P)) pp) 
=(¥fpt+(x¥p],—{[YXN]> 
+[X(¥XA/))]p} 
=Yp(f)+tXp 0 Yp(f) —tY¥p oXp(f) 
=Yp(f)+t[Xp, Yp](f) = Yp(f) + t[X, Y]p(f). 
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The first equality on the first line follows from (28.9), the second equality 
from the meaning of Y;,(p); the second equality on the second line and the 
fourth line follow from (28.25). Finally, the fifth line follows if we ignore 
the ¢? term. Therefore, 


1 
(LxY)p(f) = lim —[Fpn Yip) — Ye ](/) 


1 
= lim —{t[X, Y]p}(f) =[X, Y]p(f). 


t-0f¢ 


Since this is true for all P and f,, we get 
LxY = [X, Y]. (28.31) 


This and other properties of the Lie derivative are summarized in the fol- 
lowing proposition. 


Proposition 28.4.13 Let T € T’(M) and T’ be arbitrary tensor fields and 
X a given vector field. Then 


1. Lyx satisfies a derivation property in the algebra of tensor fields, i.e., 
Lx(T @T’) = (LxT) @T' +T® (LxT). 


2. Lyx is type-preserving, i.e., LxT is a tensor field of type (r,s). 
3. Lyx commutes with the operation of contraction of tensor fields; in par- 
ticular, in combination with property 1, we have 


Lx (0, Y) = (Lx0, Y) + (0, LxY). 


4. Lx f =Xf for every function f. 
5. LxY=[X, Y] for every vector field Y. 


Proof Except for the last property, which we demonstrated above, the rest 
follow directly from definitions and simple manipulations. The details are 
left as exercises. 


Although the Lie derivative of a vector field is nicely given in terms of 
commutators, no such simple relation exists for the Lie derivative of a 1- 
form. However, if we work in a given coordinate frame, then a useful ex- 
pression for the Lie derivative of a 1-form can be obtained. Applying Lx to 
(0, X), we obtain 

Lx (0, Y) = (Lx9, Y) + (0, Lx Y) = (Lx0, Y) + (6, [X, Y)). (28.32) 
— 
=X((6,Y)) 
In particular, if Y = 0; and we write X = X j 0j,0= 6jdx/ , then the LHS 
becomes X(0;) = X ur) ;9;, and the RHS can be written as 
(Lx0); + (0, [X/a;, 4;]). 
— $< 


~(3;XI)8j 
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It follows that 
Lx0 = (Lx0)jdx' = (X/0;0; + 0;9;X/)dx'. (28.33) 


We give two other useful properties of the Lie derivative applicable to all 
tensors. From the Jacobi identity one can readily deduce that 


Lix y|Z = LxLyZ = LyLxZ. 
Similarly, Eq. (28.32) yields 
Lrx,yj0 = LxLy@ = LyLyx. 


Putting these two equations together, recalling that a general tensor is a lin- 
ear combination of tensor products of vectors and 1-forms, and that the Lie 
derivative obeys the product rule of differentiation, we obtain 


Lx,yJt = LxLyT — LyLxT (28.34) 


for any tensor field T. Furthermore, Eq. (28.33) and the linearity of the Lie 
bracket imply that Lexi gy = aLx + BLy when acting on vectors and 1- 
forms. It follows by the same argument as above that 


Lox+pyl =aLxT+ BLyT VTe a (M). (28.35) 


Equation (28.32) gives a rule for calculating the Lie derivative of a 1- 
form, i.e., it tells us how to evaluate Lx@ on a vector Y. We can generalize 
this for a p-form w. Write the evaluation of w on p vectors as p contractions 
as in Eq. (26.8): 


w(X1, X2,...,Xp) =Ch--- CC} (@@ Xi @X2®@--- @X,). 


Now apply the Lie derivative on both sides and use its derivation property 
and the fact the it commutes with contractions to get 


Lx(@(X1, X2,...,Xp)) =C5--- GC Lx(@ @ X; ®@X ®---@X,). 


The left-hand side is just X(@(X,, Xz, ..., X,)). For the right-hand side, we 
use 


Lx(@@X1 @-:-@X,) 
Pp 
= (Lx@) ®@X1®-+- @X_p+ ) @@X1 @---@LxX; @---@Xp. 
i=l 


Applying the contractions, using LyX; = [X, X;], and putting everything 
together, we obtain 


X(@(X1, Xo, ...,Xp)) 


Pp 
= (Lx@)(X1,...,Xp) +) o(X1,...,[X, Xi],...,Xp) 
i=1 
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which finally gives the rule by which Lx acts on p vectors: 


(Lx@) (Xi, ey e'g Xp) 


Pp 
= X((X1,X2,...,Xp)) — }>@(X1,...,[K, Xi],.... Xp). (28.36) 
i=1 


28.5 Exterior Calculus 


Skew-symmetric tensors are of special importance to applications. We stud- 
ied these tensors in their algebraic format in Chap. 26. Let us now investigate 
them as they reside on manifolds. 


Definition 28.5.1 Let M be a manifold and Q a point of M. Let Ao (M) de- 
note the space of all antisymmetric tensors of rank p over the tangent space 
at Q. Let A?(M) be the union of all AQ(M ) for all O € M. A differential 
p-form @ is a mapping w : U > A?(M) such that w(Q) € Ao(M) where 
U is, as usual, a subset of M. To emphasize their domain of definition, we 
sometimes use the notation A?(U). 


Since {dx YP 4 is a basis for T (M) at every O € M, {dx'! A--- Adx!?} 
is a basis for the p-forms. All the algebraic properties established in 
Chap. 26 apply to these p-forms at every point Q € M. 

The concept of a pullback has been mentioned a number of times in con- 
nection with linear maps. The most frequent use of pullbacks takes place in 
conjunction with the p-forms. 


Definition 28.5.2 Let M and N be manifolds and wy: M > N a differ- 
entiable map. The pullback map on p-forms is the map y* : A?(N) > 
A? (M) defined by 


w*p(X1,..., Xp) =P(WxX1,...,WxXp) forpe AP(N). 


For p = 0, i.e., for functions on M, y*w =@ow. 


It can be shown that 


W@An=Vorw'n, (yop) =p oy". (28.37) 


Since @ varies from point to point, we can define its derivatives. Recall 
that lie (M) is the collection of real-valued functions on M. Since the dual 
of R is R, we conclude that A°(M ), the collection of zero-forms, is the 
union of all real-valued functions on M. Also recall that if f is a zero-form, 
then df, the differential of f, is a one-form. Thus, the differential operator 
d creates a one-form from a zero-form. The fact that this can be generalized 
to p-forms is the subject of the next theorem (for a proof, see [Abra 88, 
pp. 111-112]). 
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Theorem 28.5.3 For each point Q of M, there exists a neighborhood U 
and a unique operator d: AP(U) — A?+'(U), called the exterior deriva- 
tive operator, such that for any w € AP(U) andn € A1(U), 


1. d(@@+n)=do-+ dn if q = p; otherwise the sum is not defined. 

2. d(@An) = (dw) An+ (—1)?@ A (dy); this is called the antiderivation 
property of d with respect to the wedge product. 

3. d(dw) =0 for any differential form @; stated differently, d od =0. 

df = (0; f)dx! for any real-valued function f. 

5. dis natural with respect to pullback; that is, dy ow* = w* ody for any 
differentiable map w : M — N. Here dy (dy) is the exterior derivative 
operating on differential forms of M (N). 


- 


Example 28.5.4 Let M = R? and w = a;dx' a 1-form on M. The exterior 
derivative of w is 


dw = (da;) \ dx! = (ajajdx/) A dx! = YS jai — dja;)dx! Adx'. 


j<i 


We see that the components of dw are the components of V x A where 
A = (aj, a2, a3). It follows that the curl of a vector in R? is the exterior 
derivative of the 1-form constructed out of the components of the vector. 


Example 28.5.5 In relativistic electromagnetic theory the electric and mag- 
netic fields are combined to form the electromagnetic field tensor. This is a 
skew-symmetric tensor field of rank 2, which can be written as® 


F=—E,dt \dx — Eydt \dy — E,dt \ dz 
+ B,dx \ dy — Bydx \dz+ Bydy A dz, (28.38) 


where ¢ is the time coordinate and the units are such that c, the velocity of 
light, is equal to 1. 

Let us take the exterior derivative of F. In the process, we use df = 
(0; f)dx', d(dx' A dx/) =0, and in dE; or dB; we include only the terms 
that give a nonzero contribution: 


OEx OEx OE, OE, 
dF= dy+ dz) Adt \dx —dx + dz) Adt Ady 
dy Oz ax Oz 


dE dE aB aB 
dx + —dy) A dt Adz+| —dt + —dz}) Adx Ady 
Ox oy ot Oz 


OB, OB, OB, OB, 
—dt —dy)Adx Ad dt dx ) Ady Adz. 
(5 "By ») ‘. 2+ (4 + oe x) pe 


8Note how in the wedge product, the first factor has a lower index (is an “earlier” coor- 
dinate) than the second factor. If this restriction is to be removed, we need to introduce a 
factor of 5 for each component (see Example 28.5.12). 
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Collecting all similar terms and taking into account changes of sign due to 
the antisymmetry of the exterior products gives 


OE, dE, OB 
ar = ( 4 fa mat Adx dy 


ox dy 
JE OE, OBy 
+ zs » \dt Adx Adz 
Ox 0z ot 
OE, OEy  dBy 
; dt \dyAd 
+( dy Oz - ot ) ace: 


OB, OB, OB, 
(= a dy = Oz 


0B 
=| (vxE+ =) |érraeaay 


Jar ady ade 


OB 
+|(vxE+ =) Jerndenax 
y 


OB 
+ I(v x E+ =) Jérady nde + (7 Buds Ady Adz 
x 
Each component of dF vanishes because of Maxwell’s equations. 


The example above shows that 


Box 28.5.6 The two homogeneous Maxwell’s equations can be writ- 
ten as dF = 0, where F is defined by Eq. (28.38). 


The exterior derivative is a very useful concept in the theory of differ- 
ential forms, as illustrated in the preceding example. However, that is not 
the only differentiation available to the differential forms. We have already 
defined the Lie derivative for arbitrary tensors. Since differential forms are 
(antisymmetrized) linear combinations of covariant tensors, Lie differenti- 
ation is defined for them as well. In fact, since differential forms have no 
contravariant parts, one uses the pullback map F;* in the definition of the 
Lie derivative instead of | ae 
_jd 


— Fro. (28.39) 


Lxo = (F;) dt 


The two derivatives defined so far have the following convenient prop- 
erty, whose proof is left as an exercise for the reader: 


Theorem 28.5.7 The exterior derivative d is natural with respect to Lx (or 
commutes with Lx) for X € X(M); that is, do Lx = Lx od. 


In the last chapter, we defined the interior product ig for p-vectors, where 
@ is a 1-form. With our shift of emphasis from p-vectors to p-forms in this 
chapter, we need to shift the role of vectors and forms. 
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Definition 28.5.8 Let X be a vector field and w a p-form on a manifold M. 
Then the interior product ix : A?(M) > A?—!(M) is defined as follows: 


ix@(X1, nate »Xp-1) = @(X, Xj, ae .,Xp—1). 


If w € A°(M), ie., if @ is just a function, we set ixw = 0. Another notation 
commonly used for ix is X]@. 


The interior product ix has the antiderivation property of Theorem 27.0.2: 
Theorem 28.5.9 Let w be a p-form and n a q-form on a manifold M. Then 
ix(@ An) = (ix@) An + (—1)?o A (ixn). 

We have introduced three types of derivation on the algebra of differential 
forms: the exterior derivative, the Lie derivative, and the interior product. 


The following theorem connects all three derivations in a most useful way 
(see A[Abra 88, pp. 115—116]): 
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Theorem 28.5.10 Let w € A?(M), f € A°(M), and X € X(M). Let Relation between d, Lx, 


ix : AP(M) > AP-!(M), d: AP(M) > APt!(M), and Lx : AP(M) > 
AP(M) be the interior product, the exterior derivative, and the Lie deriva- 
tive, respectively. Then 


l. ixdf=Lyf. 
2. Lx =ixod+doix. 
3. Lrxw= flxw+df Nix. 


ifx=x/ 0; andw = Wiqin.ipy dx!! Adx!2.A.--Adx'r*', then the reader 
may verify that ixw = X aii, aoe A-++Adx'? In particular, we have the 
useful formula 


ix(dx"! A dx? Av A dx'r+!) 


= XI) OP dx!) Adx? A---A dx! 
Jleesp 


. i . L +1 . . in 
ax (x enter iiyen’ ars) Ja A dx? A+++ dxi?, (28.40) 
1s 


Theorem 28.5.11 For a p-form @, we have 


dw(X1, ceed Xp+1) 


p+l 

= CDT Xi (OK, ..., Xi... Xp41)) 

i=l 

+ So di Me(EX),X)].X1,....X),-...Xj,-..,Xpsi) 
l<i<j<p+l1 


where the circumflex on a symbol means that symbol is to be omitted. 


and ix 
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Proof We use mathematical induction on p. From item 2 of Theo- 
rem 28.5.10, we have 


Lxq@ = ix (dw) + d(ixw) 
or 
(Lxw)(X1,...,Xp) = (ix(dw))(X1,..., Xp) +(d(ix@))(X1,..., Xp) 
A 
=dw(X,Xj,....Xp) 
or 
dw(X,X1,..., Xp) = (Lx@)(X1,..., Xp) — (d(ix@)) (Ki, ..., Xp). 


For the first term on the right-hand side, we use Eq. (28.36). For the sec- 
ond term, we use the induction hypothesis because ix@ is a (p — 1)-form. 
A Straightforward manipulation then leads to the desired result. 


Example 28.5.12 Let p = pydx% be the momentum one-form and write 
the electromagnetic field tensor as? F = 5 Fopdx* A dx®, where a and B 
run over the values 0, 1, 2, and 3 with 0 being the time index. Let 


dp _ (@Pa\ 4,0 
dt dt 


be the derivative of momentum with respect to the proper time, t. Also, let 
u=u? 0g be the velocity four-vector of a charged particle. Then the Lorentz 
force law can be written simply as dp/dt = qF(u) = —qiyF, where q is the 
electric charge of the particle whose 4-velocity is u. Note that F, a two-form, 
contracts with u, a vector, to give a one-form on the RHS. Thus, both sides 
are of the same type. Let us write this equation in component form: 


d 1 . 1 
_ dx*= 45 Fapia(dx® A dx?) a — 54 Fap (uY 68 dx") 


1 

= Saray (Opa — sat 
1 

= 574 Pap (uP dx” - u*dx?) 


1 
= 54 (Pap — Feq)uP dx” = (q Fugu’) dx®. (28.41) 
Equating the components on both sides, we get dpy/dt = q Fug u®, which 
may be familiar to the reader. To make the equation even more familiar, 
consider the component a = 1, 

dpi 


aa = qFigu’ =q[Fiou® + Figu? + Fi3u'], (28.42) 


The factor 5 is introduced here to avoid restricting the sum over @ and f. 
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and recall that u~ = dx* /dt, where 
(dt)? = (dt)? — (dx')? — (dx’)’ — (dx*)? = dt)?(1 — 0’) 


and v = (dx! /dt,dx*/dt,dx*/dt) is the 3-velocity of the particle. Since 


xo = t, we get 


9 at 1 ax! Uj : 
vo = — = —__., ule = — = ———_ _ fori = 1,2,3. 
dt Sj—v dt J/1—y2 


Substituting this in (28.42) and remembering that Fi9 = — Fo; = E1, Fi2 = 
B3, and F13 = —F3, = —Bo, we obtain 


d 1 v v 
Pi af ois 2 3 |: 


SSS = Bs By 
dt/1— v2 V1 —v2 V1—v2 V1 —v2 
or 
dpi 
Tr = q[E1 + (v2B3 — v3B2)] =[q(E+ v x B)],. 


The other components are obtained similarly. Thus, in vector form we have 


aP _ gh 4+vxB) 

== v x B), 

dt! 
where p now represents the 3-momentum of the particle. This is the Lorentz 
force law for electromagnetism in its familiar form. Again, note the simpli- 


fication offered by the language of forms. 


A combination that is very useful is that of the exterior derivative and the 
Hodge star operator. Recall that the latter is defined by 


. . 1 ited . ; 
#(dx" A---Adx'?) = Gap A++-Adx™, (28.43) 


where m is the dimension of the manifold. 


Example 28.5.13 Let us calculate *F and d(*F) where F = 5 Fopdx A 
dx? is the electromagnetic field tensor. We have 


1 1 I 1 
4F = + (5 Fonds" Aas) =5 wp * (dx* Adx*) = ; wb 5 ede Adx” 
and 


1 1 
d(x*F) = a( Fane" nas") = zijn Fp, dx" A dx" A dx’, 


where Fog,y = 0Fap/dx”. We can now use the components Fjp = Ej, 
Fig = B3, F\3 = —Bo, and Fo3 = B, to write d(*F) in terms of E and B. 
After a long but straightforward calculation, we obtain 
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E 
d(«F) = oF sh dt \dx Ad 
ot : 


dE 
+| (Ge -¥~3) atndzaay 
ot F 


dE 
+|(F-v <B) ]atndy andes @-B)dx Ady Adz 
x 
(28.44) 


The inhomogeneous pair of Maxwell’s equations is 


VxB=— 44x), V-E=4z7p, (28.45) 
where p and J are charge and current densities, respectively. We can put 
these two densities together to form a four-current one-form with p as the 

Maxwell’s zeroth component: J = Jydx*. Thus, 
inhomogeneous 1 
equations in the 40 = Jy (xdx*) = Jos E vp dx" A dx” A dx? 
language of forms : 


= Jodx Ady Ndz+ dxdt Ady AN dz+ Jydt \dz A dx 
+ J,dt \ dx A dy 
= pdx Ady Ndz— J*dt Ndy Adz— J? dt \dzAdx 
— J-dt Adx Ady, (28.46) 


where we have used the facts that p = J9 = Jo and J=(J*, J", J’) = 
—(k, Jy, Jz). 


Comparing Eqs. (28.44), (28.45), and (28.46), we note that 


Box 28.5.14 In the language of forms, the inhomogeneous pair of 
Maxwell’s equations has the simple appearance d(xF) = 47 (*J). 


Problem 28.15 shows that the relation d*w = 0 is equivalent—at least in 
IR?—to the vanishing of the curl of the gradient and the divergence of the 
curl. It is customary in physics to try to go backwards as well, that is, given 
that V x E=0, to assume that E = V f for some function f. Similarly, we 

want to believe that V - B = 0 implies that B= V x A. 
closed and exact forms What is the analogue of the above statement for a general p-form? A 
form @ that satisfies dw = 0 is called a closed form. An exact form is 
one that can be written as the exterior derivative of another form. Thus, 
every exact form is automatically closed. This is the Poincaré lemma. The 
converse of this lemma is true only if the region of definition of the form is 

topologically simple, as explained in the following. 

regions that are Consider a p-form defined on a region U of a manifold M. If all closed 
contractable toa point curves in U can be shrunk to a point in U without encountering any points 
at which is ill-defined, we say that U is contractable to a point. If @ is not 
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defined for a point P on M, then any U that contains P is not contractable to 
a point. We can now state the converse of the Poincaré lemma (for a proof, 
see [Bish 80, p. 175]): 


Theorem 28.5.15 (Converse of the Poincaré lemma) Let U be a region in 
a manifold M such that U is contractable to a point. Let w be a p-form 
on U such that dm = 0. Then there exists a (p — 1)-form n on U such that 
@=dn. 


Example 28.5.16 The electromagnetic field tensor F = 5 Fopdx” A dx? is 
a two-form that satisfies dF = 0. The converse of the Poincaré lemma says 
that if F is well behaved in a region U of R*, then there must exist a one- 
form n such that F = dn. 

Let us write this one-form in terms of coordinates as n = Agdx®. Then 
dn= Aw, pax? A dx”, and we have 


I 
5 Fapdx" A dx? = Ag gdx® Adx? 
i 
=> 5(Fap — Apa t Aap)dx" A dx? =0. 


Since dx% A dx are linearly independent and their coefficients are antisym- 
metric, each of the latter must vanish. Thus, 

dAg . dAq 

ax® = axB° 


Fop = Ag,a — Aap = 


The four-vector A® is simply the four-potential of relativistic electromag- 
netic theory. 


Note that the (p — 1)-form of Theorem 28.5.15 is not unique. In fact, if 
a@ is any (p — 2)-form, then w can be written as 


@=d(n+da) 


because d(dq@) is identical to zero. This freedom of choice in selecting 7 is 
called gauge invariance, and its generalization plays an important role in 
the physics of fundamental interactions. !° 


Historical Notes 

Jules Henri Poincaré (1854-1912): The development of mathematics in the nineteenth 
century began under the shadow of a giant, Carl Friedrich Gauss; it ended with the dom- 
ination by a genius of similar magnitude, Henri Poincaré. Both were universal mathe- 
maticians in the supreme sense, and both made important contributions to astronomy and 
mathematical physics. If Poincaré’s discoveries in number theory do not equal those of 
Gauss, his achievements in the theory of functions are at least on the same level—even 
when one takes into account the theory of elliptic and modular functions, which must 
be credited to Gauss and which represents in that field his most important discovery, al- 
though it was not published during his lifetime. If Gauss was the initiator in the theory 


'0Gauge invariance and gauge theories are discussed in detail in Chap. 35. 
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of differentiable manifolds, Poincaré played the same role in algebraic topology. Finally, 
Poincaré remains the most important figure in the theory of differential equations and the 
mathematician who after Newton did the most remarkable work in celestial mechanics. 
Both Gauss and Poincaré had very few students and liked to work alone; but the similarity 
ends there. Where Gauss was very reluctant to publish his discoveries, Poincaré’s list of 
papers approaches five hundred, which does not include the many books and lecture notes 
he published as a result of his teaching at the Sorbonne. 

Poincaré’s parents both belonged to the upper middle class, and both their families had 
lived in Lorraine for several generations. His paternal grandfather had two sons: Léon, 
Henri’s father, was a physician and a professor of medicine at the University of Nancy; 
Antoine had studied at the Ecole Polytechnique and rose to high rank in the engineer- 
ing corps. One of Antoine’s sons, Raymond, was several times prime minister and was 
president of the French Republic during World War I; the other son, Lucien, occupied 
high administrative functions in the university. Poincaré’s mathematical ability became 
apparent while he was still a student in the lycée. He won first prizes in the concours 
généal (a competition among students from all French lycées) and in 1873 entered the 
Ecole Polytechnique at the top of his class; his professor at Nancy is said to have re- 
ferred to him as a “monster of mathematics.” After graduation, he followed courses in 
engineering at the Ecole des Mines and worked briefly as an engineer while writing his 
thesis for the doctorate in mathematics which he obtained in 1879. Shortly afterward he 
started teaching at the University of Caen, and in 1881 he became a professor at the Uni- 
versity of Paris, where he taught until his untimely death in 1912. At the early age of 
thirty-three he was elected to the Académie des Sciences and in 1908 to the Académie 
Francaise. He was also the recipient of innumerable prizes and honors both in France and 
abroad. 

Before he was thirty years of age, Poincaré became world famous with his epoch-making 
discovery of the “automorphic functions” of one complex variable (or, as he called them, 
the “fuchsian” and “kleinean” functions). Much has been written on the “competition” 
between C.F. Klein and Poincaré in the discovery of automorphic functions. However, 
Poincaré’s ignorance of the mathematical literature when he started his researches is al- 
most unbelievable. He hardly knew anything on the subject beyond Hermite’s work on the 
modular functions; he certainly had never read Riemann, and by his own account had not 
even heard of the “Dirichlet principle,’ which he was to use in such imaginative fashion 
a few years later. Nevertheless, Poincaré’s idea of associating a fundamental domain to 
any fuchsian group does not seem to have occurred to Klein, nor did the idea of “using” 
non-Euclidean geometry, which is never mentioned in his papers on modular functions 
up to 1880. 

Poincaré was one of the few mathematicians of his time who understood and admired 
the work of Lie and his continuators on “continuous groups,” and in particular the only 
mathematician who in the early 1900s realized the depth and scope of E. Cartan’s papers. 
In 1899 Poincaré proved what is now called the Poincaré—Birkhoff—Witt theorem which 
has become fundamental in the modern theory of Lie algebras. The theory of differential 
equations and its applications to dynamics was clearly at the center of Poincaré’s math- 
ematical thought; from his first (1878) to his last (1912) paper, he attacked the theory 
from all possible angles and very seldom let a year pass without publishing a paper on the 
subject. The most extraordinary production of Poincaré’s, also dating from his prodigious 
period of creativity (1880-1883) (reminding us of Gauss’s Tagebuch of 1797-1801), is 
the qualitative theory of differential equations. It is one of the few examples of a mathe- 
matical theory that sprang apparently out of nowhere and that almost immediately reached 
perfection in the hands of its creator. Everything was new in the first two of the four big 
papers that Poincaré published on the subject between 1880 and 1886. 

For more than twenty years Poincaré lectured at the Sorbonne on mathematical physics; 
he gave himself to that task with his characteristic thoroughness and energy, with the re- 
sult that he became an expert in practically all parts of theoretical physics, and published 
more than seventy papers and books on the most varied subjects, with a predilection for 
the theories of light and of electromagnetic waves. On two occasions he played an impor- 
tant part in the development of the new ideas and discoveries that revolutionized physics 
at the end of the nineteenth century. His remark on the possible connection between X- 
rays and the phenomenon of phosphorescence was the starting point of H. Becquerel’s 
experiments that led him to the discovery of radioactivity. On the other hand, Poincaré 
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was active from 1899 on in the discussions concerning Lorentz’s theory of the electron; 
Poincaré was the first to observe that the Lorentz transformations form a group; and many 
physicists consider that Poincaré shares with Lorentz and Einstein the credit for the in- 
vention of the special theory of relativity. The main leitmotiv of Poincaré’s mathematical 
work is clearly the idea of “continuity”: Whenever he attacks a problem in analysis, we 
almost immediately see him investigating what happens when the conditions of the prob- 
lem are allowed to vary continuously. He was therefore bound to encounter at every turn 
what we now call topological problems. He himself said in 1901, “Every problem I had 
attacked led me to Analysis situs,” particularly the researches on differential equations 
and on the periods of multiple integrals. Starting in 1894 he inaugurated in a remark- 
able series of six papers—written during a period of ten years—the modern methods of 
algebraic topology. 

Whereas Poincaré has been accused of being too conservative in physics, he certainly 
was very open-minded regarding new mathematical ideas. The quotations in his papers 
show that he read extensively, if not systematically, and was aware of all the latest de- 
velopments in practically every branch of mathematics. He was probably the first mathe- 
matician to use Cantor's theory of sets in analysis. Up to a certain point, he also looked 
with favor on the axiomatic trend in mathematics, as it was developing toward the end 
of the nineteenth century, and he praised Hilbert’s Grundlagen der Geometrie. However, 
he obviously had a blind spot regarding the formalization of mathematics, and poked fun 
repeatedly at the efforts of the disciples of Peano and Russell in that direction; but, some- 
what paradoxically, his criticism of the early attempts of Hilbert was probably the starting 
point of some of the most fruitful of the later developments of metamathematics. Poincaré 
stressed that Hilbert’s point of view of defining objects by a system of axioms was admis- 
sible only if one could prove a priori that such a system did not imply contradiction, and 
it is well known that the proof of noncontradiction was the main goal of the theory that 
Hilbert founded after 1920. Poincaré seems to have been convinced that such attempts 
were hopeless, and K. Gédel's theorem proved him right. 


28.6 Integration on Manifolds 


We mentioned in Chap. 26 that certain exterior products are interpreted as 
volume elements. We now exploit this notion and define integration on man- 
ifolds. Starting with R”, considered as a manifold, we define the integral of 
an n-form @ as follows. Choose a coordinate system {ary ,; in R", write 
o = fdx' A--- A dx", and define the integral of the n-form as 


i o= | Te event Oe ie 
R",x ig 


where to avoid dealing with infinities, one assumes that f vanishes outside 
a bounded region. The second symbol in the lower part of the integral sign 
indicates the variables of integration. Let us now change the coordinates, 
say to {y/ ce ,- Using Eq. (28.17), which gives the transformation rule for 
1-forms when changing coordinates, and Eq. (2.32), which defines the de- 
terminant in terms of n-forms, we obtain 
1 n dx! 1 n 
@= fdx A---Adx = rae Jay A-++Ady’, 
where f is now understood to be a function of the y’s through the x’s. So, 
in terms of the new coordinates, the integral becomes 
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ax! 
/ o= | F(t) det Tay! aoe dy" 
Ry n dy/ 


If we had the absolute value of the Jacobian in the integral, the two sides 
would be equal. So, all we can say at this point is 


/ o= +f @. 
R",y R",x 


We therefore distinguish between two kinds of coordinate transformations: 
If the Jacobian determinant is positive, we say that the coordinate transfor- 
mation is orientation preserving. Otherwise, the transformation is called 
orientation reversing. 

Our ability to integrate functions on IR” depends crucially on the fact that 
volume elements do not change sign at any point of R”. If this were not so, 
we could find a finite (albeit small) region of space—in the vicinity of the 
point at which the volume element changes sign—whose volume would be 
zero. This property of IR” is the content of the following: 


Definition 28.6.1 A manifold M of dimension n is called orientable if it 
has a nowhere vanishing n-form. 


Any two nonvanishing n-forms w and w’ on an orientable manifold are 
related by a nowhere-vanishing function: w’ = hw. Clearly, h has to be either 
positive or negative everywhere. w and w’ are said to be equivalent if h is 
positive. Thus, the nonvanishing n-forms on an orientable manifold fall into 
two classes, all members of each class being equivalent to one another, and 
a member of one class being related to a member of the other class via a 
negative function. Each class is called an orientation on VM. 

Given an orientation, an n-form @, and a chart {Ug, da} on M, we define 


= teal, 28.47 
[eo © [,, 2" elo (28.47) 


where (#;')* is the pullback of #,!: R” > M, so that it maps n-forms on 
M to n-forms on R”; @|, is the restriction of @ to Uz, and the sum over a 
is assumed to exist. This amounts to saying that the region in M on which 
@ is defined is finite, or that m has compact support. 

We note that the RHS of Eq. (28.47) is an integration on R” that appears 
to depend on the choice of coordinate functions. However, it can be shown 
that the integral is independent of such choice. In practice, one chooses a 
coordinate patch and transfers the integration to IR”, where the process is 
familiar. 

If we choose a coordinate patch eon aan and integrate dx! A --- A dx" 
according to Eq. (28.47), we obtain the “volume” of the manifold M. If M 
is compact, this volume will be finite.!! 


‘Recall from Chap. 17 that a subset of R” is compact iff it is closed and bounded. It is a 
good idea to keep this in mind as a paradigm of compact spaces. 
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Theorem 28.6.2 (Stokes’ theorem) Let M be an oriented n-manifold. Let 
@ be an (n — 1)-form with compact support. Then 


i dw =0 
M 


Proof From Eq. (28.47), we have 


[ dw = ae ($2')* dole) = 2 cE d(($z.')'@la) 


where in the last equality, we used item 5 of Theorem 28.5.3. Now 
(by '!)*la is an (n — 1)-from on R”. If B is such a form, it can be writ- 
ten as 


B=Bidx' A---Adki A-+» A dx" 


and dB = )*"_,(—1)'"!0; Bidx! A --» Adx". Therefore, 


[ab= Xf vapid! nn as" 
=ycen f Ht 
Rn ox! 
=e f ({, eax’ )ds! da! dx" 
grt \ JiR Ox! 


=e [ (6 


The term in parentheses is zero because B has compact support and all its 
components must vanish at infinity. 


.dx" 


x'=00 


ae cg da 


x!=—00 


A manifold may have a boundary 0M, which is an (n — 1)-dimensional 
submanifold of M, every point of which has a coordinate neighborhood 
in which one of the coordinates is zero. For example, the xy-plane is the 
boundary of the lower space on which z = 0. As another example, consider 
an open set U of an n-manifold M. Then U is also an n-manifold, and its 
boundary dU is an (n — 1)-dimensional manifold. If M (and therefore, U) is 
oriented, then dU inherits an orientation from U. There is another version of 
the Stokes’ Theorem for manifolds with boundary, which we state without 
proof. 


Theorem 28.6.3 Let U be an oriented n-manifold with boundary 
OU. Let w be an (n — 1)-form with compact support. Then 


[w=] @ 
U aU 


Stokes’ Theorem 


Stokes’ Theorem for 
manifolds with 
boundary 
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Combining the exterior differential with the Hodge star operator, we get 
a useful quantity. 


Definition 28.6.4 The codifferential 6 is a map 6: A?(M) — A?-!(M) 
given by 
d@ = (—1)t(-1)"P TY xd xa@ 


where n is the dimension of M. If @ is a 0-form, i.e., a function f, then the 
definition leads to 6f = 0. Furthermore, since ** = +1, 62 =0. 


If M has a metric, i.e., a nondegenerate symmetric bilinear form g de- 
fined smoothly on each point of M, and g does not vary over M, then we 
have the following: 


Proposition 28.6.5 [f@ = FOI uipAX A+++ dx! is a p-form onan n- 
manifold M with constant metric g, then 


(1)? i i i 
6@ = Gam i, pax A+++ Adx pol 
p= : ae 


where VO ip = BP OWI, ip pip = 91(8"'? Wiy...ip tip) 
Proof Start with the definition of 5 and the Hodge star operator: 


bo = (—1)"*1(-1)"P*) xd x@ 


(-1)"t1(—1)" + D 


i( YI * d(w''"P ¢;, 5,dx'P! AN ax); 
p\(n — p)! 


Now note that d differentiates only the function w!!!» because d* = 0. 
Differentiating and applying + afterwards, we get 


= 


) 


Zeya 


bw €iy...1 Aol? x (dx! A dx'Pt# A+++ A dx’) 
pi(n — p)! " 
1th pyr 53 
7 aes x Ei .in BO"? 


‘ip giptiJp+i inne, 5, , i Jp-1 
x (p— pies” pe 18" "Cinta nedp 1% pate Deager, 


Rearranging the indices of the e, 


=p+1)(p-1 
=(-1)" p+1)(p Mens 


Jn 2. 


€ ji. in l--dp-t 


manipulating the powers of —1, noting that A‘ B, = A, B*, and using the 
fact that g’” are constant,!? we rewrite the last expression as 


!2Reader, see where this fact is used! 
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(—1)'t!(-1)@-” 
~ p\(n— p)\(p— 1! 


lingi™! — ghnagh a... 
L 


lisdn ge, » Bion 
€ € j..-jn 1 i1...ip 8 Sint . - 


A dxdp-! 
(-1)?"! i secin ify siptl Tati 
~ Pi\(n — p)!\(p — D1 dit n Oi in TR get ae ee 
A dxsr-!, 


where we used Eq. (26.44), the fact that (— yr = (—1)” for any integer m 
and that 8) = bj. The last expression is now reduced to 


(-1)?7! ietpigiaids oT 7 
ee pi(n— p)!(p— 1)! hodaracks IjWiy...i pS! dx!! A---Adx/?-! 
(<P in. i oe 8 
~ re Oj0i).1,8 PAX" A--- Adxir-! 
(-1)?"! 


= "Gap dp SPAT A+ Nxt, 


where we used Eq. (26.47) in the first equality and Eq. (26.50) in the last. 


If M has a metric, then a metric g can be defined on A?(M) in exact 
analogy with Definition 26.3.13, and we have the following: 


Theorem 28.6.6 Let M be an oriented n-manifold with volume element 
and metric g. Let a € A?(M) and B € AP+!(M) such that o A *B has 
compact support. Then 


| a@.s6m= | aida. 6) 
M M 
Proof From Theorem 26.6.4, we have 
G(@, 5B) =a A *5B =a A *((—1)°*!(-1)"P) xd x B) 
=e epee)? Paad+p 
=—(-1)?aAdx*B 
because (—1)? = (-1)-”” = (—1)? for any integer p. Hence, 


g(da, B)w — g(a, 5B) = da A «Bp + (—1)’a Ad x B=d(@ A *B) 


and the integral over M of the right-hand side is zero by Stokes’ Theorem. 


28.7. Symplectic Geometry 


Mechanics stimulated a great deal of dialogue between physics and mathe- 
matics in the latter part of the nineteenth century and the beginning of the 
twentieth. The branch of mathematics that benefited the most out of this 


902 


symplectic form, 
symplectic structure, 
and symplectic manifold 
defined 


Darboux theorem 


symplectic charts, 
canonical coordinates, 
and canonical 
transformations 


Coordinate 
representation of sharp 
and flat maps 


28 Analysis of Tensors 


dialog is the theory of differentiable manifolds, whose tribute back to me- 
chanics has been the most beautiful language in which the latter can express 
itself, the language of symplectic geometry. All the discussion of symplectic 
vector spaces of the last chapter can be carried over to the tangent spaces of 
a manifold and patched together by the differentiable structure of the mani- 
fold. 


Definition 28.7.1 A symplectic form (or a symplectic structure) on a 
manifold M is a nondegenerate, closed 2-form @ on M. A symplectic man- 
ifold (M,q) is a manifold M together with a symplectic form w on M. We 
define the map b : X(M) > X*(M) by 


b(X) =X =ixw = (X) 
and the map £ : X*(M) — X(M) as the inverse of b. 


Chapter 26 identified some special basis, the canonical basis, in which the 
symplectic form of a symplectic vector space took on a simple expression. 
The analogue of such a basis exists in a symplectic manifold. The reader 
should keep in mind that this existence is not automatic, because although 
one can find such bases at every point of the manifold, the smooth patching 
up of all such bases to cover the entire manifold is not trivial and is the 
content of the following important theorem, which we state without proof 
(see [Abra 85, p. 175]): 


Theorem 28.7.2 (Darboux) Suppose w is a 2-form on a 2n-dimensional 
manifold M. Then dw = 0 if and only if there is a chart (U,@) at each 
P€M such that g(P) = 0 and 


n 
o= yas A dy’, 
i=1 


1 


where x°,...,x", y, ..., ¥” are coordinates on U. Furthermore, on such a 


chart, the volume element [Ly is 


TAs Adz” Ady! Avs Ady". 


Lo =dx 
Definition 28.7.3 The charts guaranteed by Darboux’s theorem are called 
symplectic charts, and the coordinates x’, y' are called canonical coordi- 
nates. If (M,q@) and (N, p) are symplectic manifolds, then a C° map f : 
M — N is called symplectic, or a canonical transformation, if f*p =. 


Example 28.7.4 In this example, we derive a formula that gives the action 
of w’ and w* in terms of components of vectors and 1-forms in canonical 
coordinates. Let 
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be a vector field. When w” acts on Z, it gives a 1-form, which we write as 
ow (Z) = U,dx* + Widy*. To find the unknowns U; and Wx, we let both 
sides act on coordinate basis vectors. For the RHS, we get 


a a a 
Uzdx* + Wedy*)( — ) =U, dx*( — )+w, dy*( — ) =U; 
(Uxdx* + say) k aX ae +Wi dy ax j 
— — 


— ok =0 
=; 


and 


0 
(Uxdx* + widy*)() = Wj. 


For the LHS, we obtain 


G (x a ae - ) (5) 
foes) 


But 


and 


ar) a‘ a) 
@ area) Yidx* a dy* TE ay aN 
dy’ OxJ = dy’ Ox/ 
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It follows that 


Similarly, 


Therefore, 


Yidx/ + X/dy/, (28.48) 


w’{ x! d +y¥' 2 = 
ax! ay! 
where a summation over repeated indices is understood. 
If we multiply both sides of this equation by w' on the left and recall that 
ww = 1, we obtain the following equation for the action of w*: 


w*(X/dx! + Yidy!) =Y' 2 aah (28.49) 


Equations (28.48) and (28.49) are very useful in Hamiltonian mechanics. 


Our discussion of symplectic transformations of symplectic vector spaces 
showed that such maps are necessarily isomorphisms. Applied to the present 
situation, this means that if f : M — N is symplectic, then f, :Tp(M) > 
J (p)(N) is an isomorphism. Theorem 28.3.2, the inverse mapping theo- 
rem, now gives the following theorem. 


Theorem 28.7.5 If f : M — N is symplectic, then it is a local diffeomor- 
phism. 


Example 28.7.6 Hamiltonian mechanics takes place in the phase space of 
a system. The phase space is derived from the configuration space as fol- 
lows. Let (q1,... 
tem. They describe an n-dimensional manifold N. The dynamics of the sys- 


+ Qn) be the generalized coordinates of a mechanical sys- 


tem is described by the (time-independent) Lagrangian L, which is a func- 
tion of (q', g'). But g! are the components of a vector at (g1,..., gn) [see 
Eq. (28.13) and replace y! with x’]. Thus, in the language of manifold the- 
ory, a Lagrangian is a function on the tangent bundle, L : T(N) > R. 

The Hamiltonian is obtained from the Lagrangian by a Legendre trans- 
formation: H = )~"_, pig! — L. The first term can be thought of as a pairing 
of an element of the tangent space with its dual. In fact, if P has coordinates 
(41,--+;4n), then q= q 0; € Tp(N) (with the Einstein summation conven- 
tion enforced), and if we pair this with the dual vector p; dxie J> (N), we 
obtain the first term in the definition of the Hamiltonian. The effect of the 
Legendre transformation is to replace g' by p; as the second set of indepen- 
dent variables. This has the effect of replacing T(N) with T*(NV). Thus 
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Box 28.7.7 The manifold of Hamiltonian dynamics, or the phase 
space, is T*(N), with coordinates (q', p;) on which the Hamiltonian 
H :T*(N) = R is defined. 


T*(N) is 2n-dimensional; so it has the potential of becoming a sym- 
plectic manifold. In fact, it can be shown that!’ the 2-form suggested by 
Darboux’s theorem, 


n 
w=) dq! dpi. (28.50) 


i=] 


is nondegenerate, and therefore a symplectic form for T*(N). 


The phase space, equipped with a symplectic form, turns into a geometric 
arena in which Hamiltonian mechanics unfolds. We saw in the above exam- 
ple that a Hamiltonian is a function on the phase space. More generally, if 
(M, @) is asymplectic manifold, a Hamiltonian H is a real-valued function, 
H: M — R. Given a Hamiltonian, one can define a vector field as follows. 
Consider dH € T*(M). For a symplectic manifold, there is a natural iso- 
morphism between T*(M) and T(M), namely, w". The unique vector field 
Xy associated with dH is the vector field we are after. 


Definition 28.7.8 Let (M,q@) be a symplectic manifold and H: M—> Ra 
real-valued function. The vector field 


X17 =o (dH) = (dH)* 


is called the Hamiltonian vector field with energy function H. The triplet 
(M, @, X17) is called a Hamiltonian system. 


The significance of the Hamiltonian vector field lies in its integral curve 
which turns out to be the path of evolution of the system in the phase space. 


This is shown in the following proposition. 


Proposition 28.7.9 If (qi, -223Q”, Ply+++3 Pn) are canonical coordinates 
for @—so w =~ dq' A dp;—then, in these coordinates 


(28.51) 


0H 0 0H oO (= ~~) 


Xy = - : = : - 
4 Opi dx! aq dp; api’ qi 


'3Here, we are assuming that the mechanical system in question is nonsingular, by which 
is meant that there are precisely n independent p;’s. There are systems of considerable 
importance that happen to be singular. Such systems, among which are included all gauge 
theories such as the general theory of relativity, are called constrained systems and are 
characterized by the fact that w is degenerate. Although of great interest and currently 
under intense study, we shall not discuss constrained systems in this book. 
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Therefore, (q(t), p(t)) is an integral curve of Xy iff Hamilton’s equations 
hold: 


= = = PSI ch (28.52) 


from the definition of Xj in terms of dH, and from Eq. (28.49). The second 
part follows from the definition of integral curve and Eq. (28.19). 


We called H the energy function; this is for good reason: 


Theorem 28.7.10 Let (M,@, Xj) be a Hamiltonian system and y(t) an 
integral curve of Xy. Then H(y(t)) is constant in t. 


Proof We show that the time-derivative of H(y(f)) is zero: 


d 

Fri (v(t) = Yer () by Proposition 28.2.4 
= dH (Yxt) by Eq. (28.14) 
=dH(Xu(y())) by definition of integral curve 


= [@’ (Xx (y (t))) | (Xx (y (t))) by definition of Xq (y (t)) 
= (Xu (y(t), Xa(y@)) by the definition of w? 


=0 because w is skew-symmetric 


Theorem 28.7.10 is the statement of the conservation of energy. 


Historical Notes 

Sir William Rowan Hamilton (1805-1865), the fourth of nine children, was mostly 
raised by an uncle, who quickly realized the extraordinary nature of his young nephew. 
By the age of five, Hamilton spoke Latin, Greek, and Hebrew, and by the age of nine had 
added more than a half dozen languages to that list. He was also quite famous for his skill 
at rapid calculation. Hamilton’s introduction to mathematics came at the age of 13, when 
he studied Clairaut’s Algebra, a task made somewhat easier as Hamilton was fluent in 
French by this time. At age 15 he started studying Newton, whose Principia spawned an 
interest in astronomy that would provide a great influence in Hamilton’s early career. 

In 1822, at the age of 18, Hamilton entered Trinity College, Dublin, and in his first year 
he obtained the top mark in classics. He divided his studies equally between classics and 
mathematics and in his second year he received the top award in mathematical physics. 
Hamilton discovered an error in Laplace’s Méchanique céleste, and as a result, he came 
to the attention of John Brinkley, the Astronomer Royal of Ireland, who said: “This young 
man, I do not say will be, but is, the first mathematician of his age.” While in his final 
year as an undergraduate, he presented a memoir entitled Theory of Systems of Rays to 
the Royal Irish Academy in which he planted the seeds of symplectic geometry. 
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Hamilton’s personal life was marked at first by despondency. Rejected by a college 
friend’s sister, he became ill and nearly suicidal. He was rejected a few years later by 
another friend’s sister and wound up marrying a very timid woman prone to ill health. 
Hamilton’s own personality was much more energetic and humorous, and he easily ac- 
quired friends among the literati. His own attempts at poetry, which he himself fancied, 
were generally considered quite poor. No less an authority than Wordsworth attempted 
to convince him that his true calling was mathematics, not poetry. Nevertheless, Hamilton 
maintained close connection with the worlds of literature and philosophy, insisting that 
the ideas to be gleaned from them were integral parts of his life’s work. While Hamilton 
is best known in physics for his work in dynamics, more of his time was spent on studies 
in optics and the theory of quaternions. In optics, he derived a function of the initial and 
final coordinates of a ray and termed it the “characteristic function,” claiming that it con- 
tained “the whole of mathematical optics.” Interestingly, his approach shed no new light 
on the wave/corpuscular debate (being independent of which view was taken), another 
appearance of Hamilton’s quest for ultimate generality. 

In 1833 Hamilton published a study of vectors as ordered pairs. He used algebra to study 
dynamics in On a General Method in Dynamics in 1834. The theory of quaternions, on 
which he spent most of his time, grew from his dissatisfaction with the current state of the 
theoretical foundation of algebra. He was aware of the description of complex numbers as 
points in a plane and wondered if any other geometrical representation was possible or if 
there existed some hypercomplex number that could be represented by three-dimensional 
points in space. If the latter supposition were true, it would entail a natural algebraic 
representation of ordinary space. To his surprise, Hamilton found that in order to create 
a hypercomplex number algebra for which the modulus of a product equaled the product 
of the two moduli, four components were required—hence, quaternions. 

Hamilton felt that this discovery would revolutionize mathematical physics, and he spent 
the rest of his life working on quaternions, including publication of a book entitled Ele- 
ments of Quaternions, which he estimated would be 400 pages long and take two years to 
write. The title suggests that Hamilton modeled his work on Euclid’s Elements and indeed 
this was the case. The book ended up double its intended length and took seven years to 
write. In fact, the final chapter was incomplete when Hamilton died, and the book was 
finally published with a preface by his son, William Edwin Hamilton. While quaternions 
themselves turned out to be of no such monumental importance, their appearance as the 
first noncommutative algebra opened the door for much research in this field, including 
much of vector and matrix analysis. (As a side note, the “del” operator, named later by 
Gibbs, was introduced by Hamilton in his papers on quaternions.) 

In dynamics, Hamilton extended his characteristic function from optics to the classical 
action for a system moving between two points in configuration space. A simple trans- 
formation of this function gives the quantity (the time integral of the Lagrangian) whose 
variation equals zero in what we now call Hamilton’s principle. Jacobi later simplified 
the application of Hamilton’s idea to mechanics, and it is the Hamilton—Jacobi equation 
that is most often used in such problems. Hamiltonian dynamics was rescued from what 
could have become historical obscurity with the advent of quantum mechanics, in which 
its close association with ideas in optics found fertile application in the wave mechanics 
of de Broglie and Schrédinger. Hamilton’s later life was unhappy, and he became ad- 
dicted to alcohol. He died from a severe attack of gout shortly after receiving the news 
that he had been elected the first foreign member of the National Academy of Sciences of 
the USA. 


In the theoretical development of mechanics, canonical transformations 
play a central role. The following proposition shows that the flows of a 
Hamiltonian system are such transformations: 


Proposition 28.7.11 Let (M,w, Xj) be a Hamiltonian system, and F; the 
flow of Xy. Then for each t, F;;w =a, i.e., F; is symplectic. 
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Proof We have 


d 
ae = FY ix, @ by Eq. (28.39) 


= Fx (ix,dw-+dix,@) by Theorem 28.5.10 
= F*(0+ddH) because dw = 0 and ixw = w (X) 
=0 because d? = 0. 


Thus, F;*@ is constant in t. But Fo = id. Therefore, Fw =. 


The celebrated Liouville’s theorem of mechanics, concerning the preser- 
vation of volume of the phase space, is a consequence of the proposition 
above: 


Corollary 28.7.12 (Liouville’s theorem) F; preserves the phase volume 
Bags 


Definition 28.7.13 Let (M, @) be asymplectic manifold. Let f, g: M—> R 
with X¢ = (df )F and X, = (dg)* their corresponding Hamiltonian vector 
fields. The Poisson bracket of f and g is the function 


{f, g} =@(X/, Xg) = ix,ix,@ = —ix;ix,@. 


We can immediately obtain the familiar expression for the Poisson 
bracket of two functions. 


Proposition 28.7.14 In canonical coordinates (q',...,q", P\, +++ Pn), We 
have 
n 
af dg df dg 
fE2iS= >, i 7 |: 
a1 \99' Opi AD: 9G 
In particular, 
i gil=0 -n+=0 i nasi 
{q 7 les ? {pi, pjt= {q = 7 
Proof From Eq. (28.51), we have 
of a of ad dg 2a dg oO 
W(X ).X,) =0( f : ‘i ; § & 
~~ dpi 9q' dq! dp; Apj 9q/ ~—-Ag/ Ap; 
>> af dag o(Z a af dg (9 2) 
AZ, 9P: 9P) \9q'* dq!) Opi Aq \ Aq" Op; 
ED ae ————— 
= =5/ 
af dg ad. 9 af dg a 9a 
@ so Poe : 
dq! dp; \dpi’ dqi ) — dq' Agi \ Opi’ Ap; 
_—_—_——_—_——— 


= kr = 
= 5) 0 
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“(af og _ af dg 
“xl ) 
i=1 


dqi dpi Ap; Aqi 


‘= 


where we have assumed that @ = )°7_, dq* © dp x. The other formulas fol- 
low immediately once we substitute p; or q' for f or g. 


28.8 Problems 


28.1 Provide the details of the fact that a finite-dimensional vector space V 
is a manifold of dimension dim V. 


28.2 Choose a different curve y : R — R* whose tangent at u = 0 is still 
(ay, dy) of Example 28.2.2. For instance, you may choose 


y(u) = (Su eA, Su - 0). 


Show that this curve gives the same relation between partials and unit vec- 
tors as obtained in that example. Can you find another curve doing the same 
job? 


28.3 For every t € Jp(M) and every constant function c € F°(P), show 
that t(c) = 0. Hint: Use both parts of Definition 28.2.3 on the two functions 
f=cand g=1. 


28.4 Find the coordinate vector field 0; of Example 28.2.10. 


28.5 Use the procedure of Example 28.2.10 to find a coordinate frame 
for S* corresponding to the stereographic projection charts (see Exam- 
ple 28.1.12). 


28.6 Let (x!) and (yd ) be coordinate systems on a subset U of a manifold 
M. Let X! and Y' be the components of a vector field with respect to the 
two coordinate systems. Show that Y! = X/dy! /dx/. 


28.7 Show that if y : M— N is a local diffeomorphism at P € M, then 
Wsp :Tp(M) > Typ) (WN) is a vector space isomorphism. 


28.8 Let X be a vector field on M and wy: M > WN a differentiable map. 
Then for any function f on N, [w,X](f) is a function on N. Show that 


X(f ow) = {IveXl(P)} ov. 


28.9 Verify that the vector field X = —yd, + xdy has an integral curve 
through (xo, yo) given by 


xX =xgcost — yosint, 


y= xo sint + yocost. 
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28.10 Show that the vector field X = x7, + xydy has an integral curve 
through (xo, yo) given by 


X0 yO 
t) = —_, tH= : 
aA 1 — xot ya) 1 — xot 


28.11 Let X and Y be vector fields. Show that X o Y— Xo Y is also a vector 
field, i.e., it satisfies the derivation property. 


28.12 Prove the remaining parts of Proposition 28.4.13. 


28.13 Suppose that x! are coordinate functions on a subset of M and w and 
X are a 1-form and a vector field there. Express w(X) in terms of component 
functions of w and X. 


28.14 Show that do Lx = Lx od. Hint: Use the definition of the Lie deriva- 
tive for p-forms and the fact that d commutes with the pullback. 


28.15 Let M =R?° and let f be a real-valued function. Let w = ajdx' be a 
one-form and n = bdx? Adx? + byodx? Adx! +b3dx! Adx* be atwo-form 
on R?. Show that 


(a) df gives the gradient of f, 
(b) dn gives the divergence of the vector B = (by, b2, b3), and that 
(c) Vx (Vf) =0and V-(V x A) =0 are consequences of d* = 0. 


28.16 Show that ix is an antiderivation with respect to the wedge product. 
28.17 Given that F = 4 Fugdx® A dx*, show that F A («F) = |B|? — |E|’. 


28.18 Use Eq. (28.41) to show that the zeroth component of the relativistic 
Lorentz force law gives the rate of change of energy due to the electric field, 
and that the magnetic field does not change the energy. 


28.19 Derive Eq. (28.44). 


28.20 Write the equation 
dAg dAg 


ax% = axB 


Fup = AB.a 7 Aa,p = 
in terms of E, B, and vector and scalar potentials. 


28.21 With F = 5 Fopdx” A dx? and J = J,,dx” , show that d « F = 47 («J) 
takes the following form in components: 
are 
axP 


=47J*, 
where indices are raised and lowered by diag(—1, —1, —1, 1). 


28.22 Interpret Theorem 28.5.15 for p = 1 and p= 2 on R?. 


28.8 Problems 911 
28.23 Let f be a function on R*. Calculate d * df. 


28.24 Show that current conservation is an automatic consequence of Max- 
well’s inhomogeneous equation d * F = 47r(*J). 


Part IX 
Lie Groups and Their Applications 
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The theory of differential equations had flourished to such a level by the 
1860s that a systematic study of their solutions became possible. Sophus 
Lie, a Norwegian mathematician, undertook such a study using the same 
tool that was developed by Galois and others to study algebraic equations: 
group theory. The groups associated with the study of differential equations, 
now called Lie groups, unlike their algebraic counterparts, are uncountably 
infinite, and, as such, are both intricate and full of far-reaching structures. It 
was beyond the wildest dream of any 19th-century mathematician to imag- 
ine that a concept as abstract as Lie groups would someday find application 
in the study of the heart of matter. Yet, three of the four fundamental inter- 
actions are described by Lie groups, and the fourth one, gravity, is described 
in a language very akin to the other three. 


29.1 Lie Groups and Their Algebras 


Lie groups are infinite groups that have the extra property that their mul- 
tiplication law is differentiable. We have seen that the natural setting for 
differentiation is the structure of a manifold. Thus, Lie groups must have 
manifold properties as well as group properties. 


Definition 29.1.1 A Lie group G is a differentiable manifold endowed 
with a group structure such that the group operation G x G — G and the 
map G —> G given by gt g7! are differentiable. If the dimension of the 
underlying manifold is r, we say that G is an r-parameter Lie group. 


Because of the dual nature of Lie groups, most of their mapping proper- 
ties combine those of groups and manifolds. For instance, a Lie group ho- 
momorphism is a group homomorphism that is also C°, and a Lie group 
isomorphism is a group isomorphism that is also a diffeomorphism. 


Example 29.1.2 (GL(V) is a Lie group) As the paradigm of Lie groups, 
we consider GL(YV), the set of invertible operators on an n-dimensional real 
vector space V, and show that it is indeed a Lie group. The set £(V) is a 
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vector space of dimension n” (Proposition 26.1.1), and therefore, by Exam- 
ple 28.1.7, a manifold of the same dimension. The map det: £(V) > Risa 
C* map because the determinant, when expressed in terms of a matrix, is a 
polynomial. In particular, it is continuous. Now note that 


GL(V) = det~'(R — {0}) 


and that R — {0} is open. It follows that GL(V) is an open submanifold 
of £(V). Thus, GL(V) is an n?-dimensional manifold. Choosing a basis B 
for V and representing operators (points) A of GL(V) as matrices (aj;) in 
that basis provides a coordinate patch for GL(V). We denote this coordinate 
patch by {x}, where x/ (A) = Ajj. 

To show that GL(V) is a Lie group, we need to prove that if A, B € GL(V), 
then 


AB : GL(V) x GL(V) > GL(V) and A~!:GL(V) > GL(V) 


are C° maps of manifolds. This is done by showing that the coordinate 
representations of these maps are C®. These representations are simply the 
matrix representations of operators. Since AB is a linear function of elements 
of the two matrices, it has derivatives of all orders. It follows that AB is C™. 
The case of A~! is only slightly more complicated. We note that 


-1_ P (aij) 
detA ’ 


P(qjj) = a polynomial in a;;. 


Thus, since detA is also a polynomial in a;;, the kth derivative of Aq! is 
of the form Q(aq;;)/(det A)® , where Q is another polynomial. The fact that 
det A 0 establishes the C® property of A7!. 

One can similarly show that if V is a complex vector space, then GL(V) 
is a manifold of dimension 2n?. 


Example 29.1.3 (SL(V) is a Lie group) Recall that SL(V) is the subgroup 
of GL(V) whose elements have unit determinant. Since det : GL(V) —> R is 
C°, Theorem 28.3.8 and the example after it show that SL(V) = det—!(1) is 
a submanifold of GL(V) of dimension dim GL(V) — dimR = n? — 1. Since 
it is already a subgroup, we conclude that SL(V) is also a Lie group (Prob- 
lem 29.5). Similarly, when V is a complex vector space, one can show that 
dim SL(V) = 2n? — 2. 


Example 29.1.4 (Other examples of Lie groups) The reader may verify the 
following: 


(a) Any finite-dimensional vector space is a Lie group under vector addi- 
tion. 

(b) The unit circle S', as a subset of nonzero multiplicative complex num- 
bers is a Lie group under multiplication. 

(c) The product G x H of two Lie groups is itself a Lie group with the 
product manifold structure and the direct product group structure. 

(d) GL(n,R), the set of invertible n x n matrices, is a Lie group under 
matrix multiplication. 
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(e) Let G=GL(n,R) x R" be the product manifold. Define the group 
operation by (A, u)(B, v) = (AB, Av + wu). The reader may verify that 
this operation indeed defines a group structure on G. In fact, G be- 
comes a Lie group, called the group of affine motions of R”, for if 
we identify (A, u) with the affine motion! x Ax + u of R”, then the 
group operation in G is composition of affine motions. We shall study 
in some detail the Poincaré group, a subgroup of the group of affine 
motions, in which the matrices are (pseudo) orthogonal. 


In calculations, one translates all group operations to the corresponding 
operations of charts. This is particularly useful when the group multiplica- 
tion can be defined only locally. One then speaks of an r-parameter local 
Lie group. To be precise, one considers a neighborhood U of the origin 
of R’ and defines an associative “multiplication” m:U x U — R’ and an 
inversion i : Ug — U where Up is a subset of U. We therefore write the 
multiplication as 


m(a,b)=c, a,b,cEeR’, 


where a = (a!,a’,...,a"), etc. are coordinates of elements of G. The 


coordinates of the identity element of G are taken to be all zero. Thus, 
m(a, 0) = a and m(a, i(a)) = 0. In component forms, 


=m ab wm lai@)SU) hH1; 2.08. (29.1) 


The fact that G is a manifold implies that all functions in Eq. (29.1) are 
infinitely differentiable. 


Example 29.1.5 As an example of a local 1-parameter Lie group, consider 
the multiplication rule m:U x U > R where U = {x € R||x| < 1} and 


2xy—-x—y 


m(x,y)= x,yeuUu. 


xy-1 
The reader can check that m(x, (y, z)) =m((x, y), Z), so that the multipli- 
cation is associative. Moreover, m(0, x) = m(x,0) = x for all x € U, and 
i(x) = x/(2x — 1), defined for Up = {x € R||x| < sh 


29.1.1 Group Action 


As mentioned in our discussion of finite groups, the action of a group on a 
set is more easily conceived than abstract groups. In the case of Lie groups, 
the natural action is not on an arbitrary set, but on a manifold. 


Definition 29.1.6 Let M be a manifold. A local group of transformations 
acting on M is a (local) Lie group G, an (open) subset U with the property 
{fe} x MCU CGx M, anda map YW: U > M satisfying the following 
conditions: 


'These consist of a linear transformation followed by a translation. 
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Fig. 29.1 For small regions of M, we may be able to include a large portion of G. 
However, if we want to include all of M, as we should, then only a small neighborhood 
of the identity may be available 


1. If(g, P) €U, (h,W(g, P)) €U, and (hg, P) € U, then 
W(h, W(g, P)) =W (hg, P). 


2. We, P)=P forall Pe mM. 
3. If (g, P) €U, then (g~!, W(g, P)) EU and W(g—!, W(g, P)) =P. 


Normally, we shall denote W(g, P) by g- P, or gP. Then the conditions 
of the definition above take the simple form 


g-(h- P)=(gh)-P, g,heG, PEM, 
e-P=P forall PeM, (29.2) 
g '-(g-P)=P, géG, PeM, 


whenever g - P is defined. Note that the word “local” refers to G and not 
M, i.e., we may have to choose a very small neighborhood of the identity 
before all the elements of that neighborhood can act on all points of M (see 
Fig. 29.1). 

All the properties of a group action described in Chap. 23 can be applied 
here as well. So, one talks about the orbit of G as the collection of points in 
M obtained from one another by the action of G; the stabilizer G, of a point 
x € M as the collection of all group elements leaving x fixed; transitive 
action of G on M when there is only one orbit; free action of G on M when 
G, = {e} for all x € M; and effective action of G on M when g - x = x for 
all x € M implies that g = e. The only extra condition one has to be aware 
of is that the group action is not defined for all elements of G, and that a 
sufficiently small neighborhood of the identity needs to be chosen. Since 
“belonging to the same orbit” is an equivalence relation on M, the set of 
orbits of M is denoted by M/G. 

An important consequence of the free action of a group is the following 


Theorem 29.1.7 If G acts freely on M, then G is diffeomorphic to Gx for 
anyx eM. 


29.1 Lie Groups and Their Algebras 


Proof We assume a left action. The proof for the right action is identical to 
this proof. Consider the map ¢ : Gx —> G given by $(y) = g for y= gx. 
For this map to make sense, g must be determined uniquely from y. If g2x = 
y= 1x, then x = g, i gix, and because the action is free, we conclude that 
8 : g1 =e and gy = gz, so that indeed g is determined uniquely by y. Now, 
we have to show that ¢ is a bijection: 


Surjectivity: If g eG, then clearly gx € Gx and ¢(gx)=g. 
Injectivity: If @(y1) = (92), with yi = gix and y2 = gox, then g1 = go. 


In the old literature, the group action is described in terms of coordinates. 
Although for calculations this is desirable, it can be very clumsy for formal 
discussions, as we shall see later. Let a= (al, ...,@’) be a coordinate sys- 
tem on G and x = (x!,...,x”) a coordinate system on M. Then the group 
action W:G x M — M becomes a set of n functions described by 


x’ = W(a,x), x’ = W(b, x’) = Y(m(b, a), x), (29.3) 


where m is the multiplication law of the Lie group written in terms of coordi- 
nates as given in Eq. (29.1). It is assumed that W is infinitely differentiable. 


Box 29.1.8 Equation (29.3) can be used to unravel the multiplication 
law for the Lie group when the latter is given in terms of transforma- 
tions. 


Example 29.1.9 (Examples of groups of transformation) 


(a) The two-dimensional rotation group acts on the xy-plane as 
@(6,r) = (xcosé — ysind, x sin? + ycos@). 


If we write r’ = 6(6,,r) andr” = ®(62, r’), then a simple calculation 
shows that 


r= (x cos(01 + 62) — y sin(O, + 62), x sin(O, + 62) + y cos(@) + @2)). 


With r” = (m(6,, 62); r), we recognize the “multiplication” law as 
m(01, 02) = 01 + 2. The orbits are circles centered at the origin. 

(b) Let M=R", aa fixed vector in R”, and G = R. Define W : R x R” > 
R” by 


W(t,x)=x+ta, xeR",reR. 


This group action is globally defined. The orbits are straight lines par- 
allel to a. The group is the set of translations in the direction ain R”. translations 
The reader may verify that the “multiplication” law is addition of t’s. 
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(c) Let G=R° be the multiplicative group of nonzero positive real num- 
bers. Fix real numbers a1, @2,...,@,, not all zero. Define the action 
of G on R” by 


WA, x) SAK = (A%MH1, AMIN), 
AER, x=(x1,...,Xn) € R". 


The orbits are obtained by choosing a point in R” and applying G to 
it for different 4’s. The result is a curve in R”. For example, if n = 2, 
a, = 1, and a2 = 2, we get, as the orbit containing xo the curve 


A+ x9 = (Axo, A*y0) => y= x’, 


which is a parabola going through the origin and the point (xo, yo). 
Note that the orbit containing the origin has only one point. This group 
is called the group of scale transformations. The multiplication law 
is ordinary multiplication of (positive) real numbers. 

(d) Let G=R* act on M =R by 


_ ajx +a2 


P(a, x) a= (a1, 42, a3, a4), aja4 — a2a3 £0. 


 a3x a4’ 
The reader may verify that this is indeed the action of a group (catch 
where the condition aja4 — a2a3 # 0 is used!), and if x’ = D(b, x) 
and x” = @(a, x’), then 


We (a,b, + ayb3)x + ajbz + anb4 
(a3b1 + a4b3)x + a3bz + agba’ 


so that the multiplication rule is 
m(a, b) = (a1 bh, + a2b3, ayb2 + arb4, a3b1 + a4b3, a3b2 + agba). 


This group is called the one-dimensional projective group. 


29.1.2 Lie Algebra of a Lie Group 


The group property of a Lie group G provides a natural diffeomorphism on 
G that determines a substantial part of its structure. 


Definition 29.1.10 Let G be a Lie group and g € G. The left translation 
by g is a diffeomorphism L, : G — G defined by 


Le(h)=gh VheG. 


A vector field € on G is called left-invariant if for each g € G, & is Ly- 
related to itself; i.e.,2 


Lexc€=EoLg, or Lex (&(h)) =E(gh) Vg.heG. 


2When there is no danger of confusion, we shall use &(/) for &|n. 


29.1 Lie Groups and Their Algebras 


The set of left-invariant vector fields on G is denoted by g. A 1-form whose 
pairing with a left-invariant vector field gives a constant function on G is 
called a left-invariant 1-form. 

The right translation by g, R, : G > G, and right-invariant vector 
fields and 1-forms are defined similarly. 


The reader may easily check that right and left translations commute: 
RgoLn=LnoR,g Vg,heG. (29.4) 


It is convenient to have a coordinate representation of Ly. The coordi- 
nate representation of Ly is simply the multiplication law Lz (h) = m(g, h), 
where we have used the same symbol for coordinates as for group elements. 
Equation (28.11) can now be used to write the coordinate representation 
of Lex: 


am!/ah! dam!/an? .... am!/an" 
am*/dh! am?/ah> ... am? /ah" 

Lgx > : . ‘ (29.5) 
am" /dh! am" /ah> .... am" /ah" 


where all the derivatives in the matrix are evaluated at (g, h). 

We have already mentioned in Chap. 28 that for a general manifold M, 
X(M) is an infinite-dimensional Lie algebra under the Lie bracket “‘multipli- 
cation”. In general, X(M) has no finite-dimensional subalgebra. However, 
Lie groups are an exception: 


Proposition 29.1.11 Let G be a Lie group and g the set of its left- 
invariant vector fields. Then g is a real vector space, and the map ¢: 
g — Je(G), defined by $(&) = &(e), is a linear isomorphism. Therefore, 
dim g = dimJ.(G) = dimG. Furthermore, g is closed under Lie brackets; 
i.e., g is a Lie algebra. 


Proof It is clear that g is a real vector space. If #(&) = @(y) for &, € g, 
then 


&(g) = Lex(E(e)) = Lex (n@)) =n(g) Vee G>é=y. 


This shows that @ is injective. To show that @ is surjective, suppose that 
v € J.(G) and define the vector field € on G by &(g) = Lgx(v) forall g € G. 
Then $(&) = v and é € g, because 


Lgx 0 §(h) = Lyx 0 Lax (¥) = Lens (v) = € (gh) = § (Leh) =§ 0 Le (h). 


This proves the first part of the proposition. The second part follows im- 
mediately from the definition of a left-invariant vector field and Theo- 
rem 28.4.4. 


The flow of & at g € G can be shown to be 


F, = gexp(t&) = Rexp(té) 8: (29.6) 
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Indeed, let Xz be the vector field associated with this flow. The action of this 
vector field on a function f is 


d 


Xele(f) = FAC exp(t&))) 


t=0 


Therefore, 


d 
(LneXe |e) (Pf) = Xele(f 0 Ln) = ad Ln(g exp(té))) 


t=0 


= Xz (hg)(f) = [Kg 0 La(g)](P). 


t=0 


d 
= a bt 8 exp(té))) 


Since this is true for all f and g, and Xg|- = &(e), we conclude that X¢ is 
the unique left-invariant vector field corresponding to &(e). 


Definition 29.1.12 The Lie algebra of the Lie group G is the Lie 
algebra g of left-invariant vector fields on G. Sometimes we think of 
€ as a vector in J-(G). In that case, we denote by Xz the left-invariant 
vector field whose value at the identity is &. 


The isomorphism of g with J.(G) induces a Lie bracket on J.(G) and 
turns it into a Lie algebra. In many cases of physical interest, it is this inter- 
pretation of the Lie algebra of G that is most useful. 

If two groups stand in some algebraic relation to one another, their Lie 
algebras will inherit such relations. More precisely, let G and H be Lie 
groups with Lie algebras g and 5, respectively. Suppose ¢ : G — H is a Lie 
group homomorphism. Then identifying g with J,(G) and 4 with J,.(#), 
and using Theorem 28.4.4, we conclude that ¢, : g > h is a Lie algebra 
homomorphism, i.e., it preserves the Lie brackets: 


«LE. 0] = [b+8, ben] VE.n eg. (29.7) 


Theorem 29.1.13 [f 6: G— H is a Lie group homomorphism, then $, : 
g — bisa Lie algebra homomorphism. 


In particular, if @ is a Lie group isomorphism, then ¢, is a Lie algebra 
isomorphism. 


Example 29.1.14 Let V be a complex vector space with its general linear 
group GL(V), a 2n?-dimensional Lie group. Recall that GL(V) is an open 
submanifold of £(V). By Eq. (28.4), Te(GL(V))T-(£(V)), where e is the 
unit operator. Now note that on the one hand, we can identify J.(£(V)) with 
£(V) [see the box after Eq. (28.4)]. On the other hand, J.(GL(V)) can be 
identified with gI(V), the Lie algebra of GL(V). Therefore, gl(V) = £(YV). 
We use the notation A(t) for a curve in GL(V) and A for the vector tangent 
to the curve. 
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It is instructive to construct the coordinate representation of vector fields 
on GL(V). Let f : GL(V) > R be a function and A a vector field. Then, we 
have 


. d dajj of 
A(f) = —(f(A@)) = —— 
= (f(A)) aaa 
or, since f is arbitrary, 
. ddjj A 
a 7a ad. a d (n), 


a be Baal <a 


where summation over repeated indices is understood and we introduced 
dA/dt as an abbreviation for a;;(0/ dx'/). However, the one-to-one corre- 
spondence between matrices and operators makes this more than just an 
abbreviation. Indeed, we can interpret dA/dt as the derivative of A and per- 
form such differentiation whenever it is possible. The equation above states 
that 


Box 29.1.15 To obtain the matrix elements (coordinates) of the op- 
erator A, one differentiates the t-dependent elements of the (matrix 
representation of the) operator A(t). 


Of particular interest are the left invariant vector fields, or equivalently, 
the vectors belonging to TJ.(GL(V)). This amounts to substituting tf = O in 
the formulas above. Thus, if A € J.(GL(V)), 


A= 4;;(0) : =o (29.8) 
ae gp 


For the product of two operators, we get 


~— 


d 
AB = — f (A(1)B()) 


d 
——o—* , b , 
dt dt (aixbkj) 


t=0 t=0 xi 


; : 0 
= (dix (0) bj O) + aix O) xj (0)) Oxi 


Skj dik 


‘ 0 . 0 dA dB 
= 4057 + bj OG = O+—O). 29.9) 


Many of the Lie groups used in physics are subgroups of GL(V). A char- 
acterization of the Lie algebras of these subgroups is essential for under- 
standing the subgroups themselves and applying them to physical situations. 
These subgroups are typically defined in terms of maps ¢: GL(V) — M for 
which M is a manifold and @, is surjective. To construct the Lie algebra of 
subgroups of GL(V), we need to concentrate on the map ¢,,- as defined on 
Te(GL(V)). 

An important map is det : GL(V) — C for a complex vector space V. We 
are interested in evaluating the map det, : T.(GL(V)) — J, (C) in which we 
consider C = R? to be a manifold. For an operator A&€7.(GL(V)) = gl(V) 
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and a complex-valued function, we have 


. d dx of  dyof 
det, (A) f = — f (det A(t = 
ee = 1 O) dt ax’ dt dy 
d 0 d C) 
= — Redet A(t) af + — Imdet A(t) af 
dt 1-9 0X 8d 1=0 OY 
=RetA i tnva =. 
Ox dy 


where we used Eq. (5.34). Since f is arbitrary and {0/0x,0/dy} can be 
identified with {1,7}, we have 


det, (A) = trA. (29.10) 


Example 29.1.16 (Lie algebra of SL(V)) The special linear group SL(V) is 
characterized by the fact that all its elements have unit determinant. 


Box 29.1.17 The Lie algebra s\(V) of the special linear group is the 
set of all traceless operators. 


This is because if we use (29.10) and (28.12) and the fact that SL(V) = 
det~! (1), we can conclude that det, (A) = trA = 0 for all Ae sI(V). 


Example 29.1.18 (Lie algebras of unitary and related groups) Let us first 
show that the set of unitary operators on V, denoted by U(YV), is a Lie 
subgroup of GL(V), called the unitary group of V. Consider the map 
w : GL(V) — H, where H is the set of hermitian operators considered as a 
vector space (therefore, a manifold) over the reals, defined by w(A) = AA‘. 
Using Eq. (29.9), the reader may verify that y,, is surjective and 


W.(A)=A+A'. (29.11) 


It follows from Theorem 28.3.8 that U(V) = v1) is a subgroup of 
GL(V). Using Eq. (28.12), we conclude that (A) = A+ A‘ =0 for all 
Acu(V), ie. 


Box 29.1.19 The Lie algebra u(V) of the unitary group is the set of 
all anti-hermitian operators. 


When the vector space is C”, we write U(n) instead of U(C”). By count- 
ing the number of independent real parameters of a matrix representing a 
hermitian operator, we can conclude that dim H = n2. It follows from The- 
orem 28.3.8 that dimU(V) =n?. 

The intersection of SL(V) and U(V), denoted by SU(V), is called the 
special unitary group. When the vector space is C”, we write SU(n) in- 
stead of SU(C”). The previous two results yields 
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Box 29.1.20 The Lie algebra su(V) consists of anti-hermitian trace- 
less operators. If dim V =n, then dimsu(V) = n* — 1. We write su(n) 
for su(V) ifV=C". 


The reader is asked to check that dimsu(V) =n? — 1. 

If we restrict ourselves to real vector spaces, then unitary and special 
unitary groups become the orthogonal group O(V) and special orthogo- 
nal group SO(V), respectively. Their algebras consist of antisymmetric and 
traceless antisymmetric operators, respectively. When V = R”, we use the 
notation O(n) and SO(n). 


Let X be a vector field on G. We know from our discussion of flows 
that X has a flow F; = exp(tX) at every point g of G with —e <t <e. 
Now, since F;(g) # g is in G, it follows from the group property of G that 
(F;)"(g) = Fur(g) € G for all n. This shows that the flow of every vector 
field on a Lie group is defined for all t € R, i-e., all vector fields on a Lie 
group are complete. Now consider g as a vector space and manifold and 
define a map exp : g > G that is simply the flow evaluated at t = 1. It can 
be shown that the following result holds ([Warn 83, pp. 103—104]): 


Theorem 29.1.21 exp: g — G, called the exponential map, is a diffeo- 
morphism of a neighborhood of the origin of g with a neighborhood of the 
identity element of G. 


This theorem states that in a neighborhood of the identity element, a Lie 
group, as a manifold, “looks like” its tangent space there. In particular, 


Box 29.1.22 Two Lie groups that have identical Lie algebras are lo- 
cally diffeomorphic. 


Example 29.1.23 (Why exp is called the exponential map) Let V be a 
finite-dimensional vector space and A € gI(V). Define, as in Chap. 4, 


tA 
tA _ _ 
Lae =1+tA+ 
and note that 
d 
—eA—AedA > —eAl =A 
dt dt |r0 
Furthermore, 
[oe] ink [o.@) nan [o.@) foe) kan 
t“A s"A t's 
tA sA __ _ k-+n 
= k! 2 nl =) La 


k=0 * n=0 : k=0 n=0 
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asd m pmrn gn 
= 2 aay 5 A” — el tsa, 
4 (mm —n)!n! 


m=0 
ice nS) 
=(t+s)™ /m! 


It follows that e’ has all the properties expected of the flow of the vector 
field A. 


The exponential map has some important properties that we shall have 
occasion to use later. The first of these properties is the content of the fol- 
lowing proposition, whose proof is left as an exercise for the reader. 


Proposition 29.1.24 Let ¢: H — G be a Lie group homomorphism. Then, 
for ally €b, we have }(expy 9) = expg (ox). 


For every g € G, let I, = KS o Lg. The reader may readily verify 


that J,, which takes x € G to gxg! € G, is an isomorphism of G, i.e., 
Ig(xy) = Ig(x)Ig(y) and I, is bijective. It is called the inner automor- 
phism associated with g. 


Definition 29.1.25 The Lie algebra isomorphism J,, = Ry oLg. tg > g 
is denoted by Ady and is called the adjoint map associated with g. 


Since g is a vector space, the adjoint map can be used to construct a 
representation of G. 


Definition 29.1.26 The adjoint representation of a Lie group G is Ad: 
G — GL(g) given by Ad(g) = Adg = Ig. 


Using Proposition 29.1.24, we have the following corollary. 


Corollary 29.1.27 exp(Adg&) = Ig expé = gexpég™! for all & € g and 
geG. 


Let {&;} be a basis for the (finite-dimensional) Lie algebra of the Lie 
group G. The Lie bracket of two basis vectors, being itself a left-invariant 
vector field, can be written as a linear combination of {&;}: 


leas jl -Ye chk e. 


On a general manifold, ct will depend on the point at which the fields are 
being evaluated. However, on Lie groups, they are independent of the point, 
as the following manipulation shows: 


[€;(g), € :(g)] = [Lex8i(€), Lex8 | (€)] = Lex[€; (0), & ((0)] 


= Lge > h(E) = Dot OL gn ©) 


k=1 k=1 
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n 
-_ k 
=) ick @Ex(g). 
k=1 


Therefore, the value of cf. at any point g € G is the same as its value at 


the identity, i.e., cf is a constant. This statement is called Lie’s second 
theorem. 


Definition 29.1.28 Let {&;}"_, be a basis for the Lie algebra g of the Lie 
group G. Then 


[€:(g).€(@)] = > ch €x(g), (29.12) 
k=1 


where ct, which are independent of g, are called the structure constants 
of G. 


The structure constants satisfy certain relations that are immediate conse- 
quences of the commutation relations. The antisymmetry of the Lie bracket 
and the Jacobi identity lead directly to 


Kk v K v K Vv (29.13) 
Cron 1 ConGep t Cuptce = 0. 


The fact that {cf p} obey Eq. (29.13) is the content of Lie’s third theorem. 


29.1.3 Invariant Forms 
If |. is a 1-form on J,(G), then w € A!(G), given by O|,= Li-1@le, isa 
left-invariant 1-form: 
@|¢ (Xe) = Li-1le(X|¢) = @|e(L,-1,XI¢) 
= @|¢(X| ,-19) = @|e(Xle) (29.14) 


independent of g. A differential form w on G is called left-invariant if 
L7(@) =@ for every g € G. If is a left-invariant p-form and {&; a a set 
of left-invariant vector fields, then, as in (29.14), the function w(&,...,& p) 
is constant on G. The exterior derivative of a left-invariant p-form satisfies 


dwé),...,€ p41) 
= (-1)'*4e0( (61, € 1, €1,---0€), 06. €j.-0 Eps) 


Isi<j<p+l 


where we used Theorem 28.5.11 and the fact that vector fields give zero 
when acting on a constant function. For a left-invariant 1-form, this yields 


dw(§,n) = —w([§.n)). (29.15) 
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Definition 29.1.29 The canonical 1-form 6 on a Lie group G is the left- 
invariant g-valued 1-form uniquely determined by 0(€) = & for all € € g. 


Let {&;}"_, be a basis for g. Then, 6 = )~?_, 6'&;, where 6! are real- 
valued 1-forms. Now the definition of 0 gives 


&, =O )= > 6'E))&;. 
i=1 


This implies that 6'(& ;) = 5, i.e., that {Oy? 4 is the dual basis of {&;}"_,. 

We now want to express the exterior derivative of 0! in the basis {6 ae 
Since it is a 2-form, dé! = a6! A6™, for some di, € R to be determined. 
Since it is invariant, it must satisfy Eq. (29.15). So, we must have 


On! AO" (Ej, &,) = —O'E |, Ex. 
For the right-hand side, using Eq. (29.12), we have 
OE; &) = ch ,0! (E) = c4,5) = cig. 
For the left-hand side, we obtain 


Om AO” (Ej Ex) = Ot}, [8! (& 0" (Ex) — 8 (E,)0” (E,)] 
= Om [545f" — 5,87"] = 2ar4, 


because a. ; 18 antisymmetric in its lower indices. The last two equations 


imply that ow, = 3c ,- We thus arrive at the Maurer-Cartan equation: 


: | re 
a9! = —c,0/ AO. (29.16) 


Multiplying both sides by &; and summing over i, the left-hand side be- 
comes d@. The right-hand side gives 


1, 3 83 i one 1 
—50in8! AOE, = 0) AO TE |, = — 516,61 


where the last expression is defined by the middle expression. Thus the 
Maurer-Cartan equation can also be written as 


do = 510.0) (29.17) 


29.1.4 Infinitesimal Action 


The action ®: G x M — M ofa Lie group on a manifold M induces a ho- 
momorphism of its algebra with X(M). If € € g, then exp(t&) € G can act on 
M ata point P to produce acurve y(t) = exp(t&)- P going through P. The 
tangent to this curve at P is defined to be the image of this homomorphism. 
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Definition 29.1.30 Let ©: G x M — M be an action. If & € g, then 
@(expté&, P) is a flow on M. The corresponding vector field on M given 
by 
d 
vip =éEu(P)= ap? exes: P) 
t t=0 


is called the infinitesimal generator of the action induced by &. 


In particular, 


Box 29.1.31 If M happens to be a vector space, and the action a rep- 
resentation as given in Box 24.1.3, then the infinitesimal generators 
constitute a representation of the Lie algebra of the group. 


Example 29.1.32 One can think of left translation on a Lie group G as an 
action of G on itself. Let 6: G x G > G be given by ®(g,h) = Lg (h). 
Then Definition 29.1.30 gives 


d 
= — R,(expté) 
t=0 dt 6 t=0 


d d 
= —@® —— 
Eg(g) at (ext. )| at exprég 


= Rosé 
by the first equation in (28.24). It follows that & ¢ is right-invariant. Indeed, 
EG o Ra(g) = EG(gh) = (Reon) =(Rpo Rg)xé = Rpx © Roxé 
= Rnx 0 &G(g). 


Since this holds for all g € G, it follows that &¢ o Ry = Rpx o &G, demon- 
strating that & G is right-invariant. 


The adjoint map of Definition 29.1.25 induces a natural action on the Lie 
algebra g with some important properties that we now explore. Define the 
adjoint action ® :G x g > g of Gong=J.(G) by ®(g, &) = Adg(é). 


Theorem 29.1.33 The infinitesimal generator § , of the adjoint action is 
adg, where adg(n) = [&, 9]. 


Proof 1n fact, 


d 
==, Adexp 1&()) 


d 
§ 4() = 7 P (expr. m) dt 


t=0 t=0 


d =f d =] 
= a Rexpte © Lexpréx(M) = PT al n(expté) 
t=0 


= Lem) = [&, 9] = a0¢(m), (29.18) 


t=0 
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where we used Eq. (29.6) as well as the definition of Lie derivative, 
Eq. (28.31). 


If 6: G x M — M isanaction, then ®, : M — M, defined by ,(P) = 
@(g,P), is a diffeomorphism of M. Consequently, ®,, : Tp(M) > 
J¢.p(M) is an isomorphism for every P € M whose inverse is ®,, = 
Po-1,- 
Proposition 29.1.34 Let ®: G x M > M be an action. Then for every 
g€Gandé,n€&g, we have 


(Adgé)m =®.&y and [&y.nyl=—l&. alu. 


Proof Let P be any point in M. Then, 


d 
(Ade) (P) = (expt Adg&, P) (by Definition 29.1.30) 


t=0 


d 
= 7 P(g expte)s P) 


7 (by Corollary 29.1.27) 


t=0 


d 
=—@ (g(exp té), ®,-1 (P)) (by definition of action) 


dt t=0 
d i 
= aoe o B(expté, B,-1 (Py) (by definition of ®,) 
t=0 
d 
— Pexlo,1(P) 7 (expt, ®,-i(P)) [by (28.24)] 
t=0 


= Dexloy-\(pyEm(Pe-1(P)) = (PexEy)(P). [by (28.28)] 


The second part of the proposition follows by replacing g with expty, so 
that 


—1 
(Adexptn§) mu = Dexpinxé yu = P xpt(—n)xS 


Differentiate both sides with respect to ¢ and note that the LHS gives 
[n, €]. The derivative of the RHS is the Lie derivative of & ,, with respect 
to —yy, which is —[n yy, & yl. 


Proposition 29.1.34 calculated the infinitesimal action of J, for a fixed 
g © G. This can be considered as a kind of partial derivative. Proposi- 
tion 28.3.7 shows us how to find the total derivative for a general action @. 
For € € J,(G) and X € Jp(M), Proposition 28.3.7 yields 


(6, X) = By (X) + Opy(E) =e-X+E-P (29.19) 
where the last identity defines the symbols on its left. Note that if &g = &(e), 


then = L,..§y in the equation above. We will have some occasions to use 
Eq. (29.19) later. 


29.1 Lie Groups and Their Algebras 


As mentioned earlier, a Lie group action is usually described in terms of 
the parameters of the group, which are simply coordinate functions on the 
group G, as well as coordinate functions on the manifold M. The infinitesi- 
mal generators, being vector fields on M, will then be expressed as a linear 
combination of coordinate frames. 

In the older literature, no mention of the manifold structure is made. 
A Lie group is defined in terms of multiplication functions and other func- 
tions that represent the action of the group on the manifold. Thus, an 
r-parameter Lie group G is a collection of two sets of functions, m® : 
R’ x R’ — R, p = 1,2,...,7r, representing the group multiplication, and 
g! :R’ x R’ ~ R, i = 1,2,...,n, representing the action of G on the 
n-dimensional manifold M. We sketch the procedure below, leaving most 
of the calculations as exercises for the reader. As we develop the theory, the 
reader is urged to compare this “coordinate-dependent” procedure with the 
“geometric” procedure—which does not use coordinates—described so far. 

The action of the group is described by the coordinate transformations* 


By = OGI5 05 WMG Sig Mn) i=l,...,n, 


xj = 6) (0,...,0; x1,...,%Xn), 


(29.20) 


as well as the group multiplication properties 


Co =Mp(Q1,...,473b1,..., by), p=l,...,7, 
dp =m,(0,...,03 a1,...,4-) =Mp(a,..., a; 0,..., 0), (29.21) 
mp (a; m(b; ¢)) = mp (m(a; b); ¢). 
Equation (29.20) is to be interpreted as a rule that takes the second set of 
arguments and transforms them via the first set into the LHS. Now suppose 


that we translate from x; to a neighboring point x/ + dx; via a set of group 
parameters {dap }n-1: We can also get to x; + dx! from x; via a new set of 


parameters,* which have to be slightly different from {ap}=1 say {dp + 
dap}/,-,- We then have 


pe be CY eee ere 
xi +dx! = di (a +daj,...,a-+ day; xX1,...,Xn), (29,22) 
ap + dap =Mp(6aj,..., by; a1, ..., Ar), 


and, with summation over repeated indices understood, 


0d; (a; x’ 
dx; = Oe) 6a, = Hix (x) bax, 
OAx a=—0 
me (29.23) 
da, = Ct) bay = Ax (addy. 
be |h=0 


3We use subscripts for coordinate functions here for typographical convenience. 


4Here we are assuming that the action of the group is transitive, i.e., that every point of 
the manifold can be connected to any other point via a transformation. 
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Inverting the second equation and substituting the resulting da’s in the first 
equation yields 


dx} = ujx(x’)O5) (ada, or dx; = uix (x), (a)day, 


where in the last equation, we changed the free coordinate variable on both 
sides. It then follows that 


OXj " -1 
= x Win (X)O., (a). (29.24) 
a K=1 


da 


Equation (29.24) and establishing that u;, is C° is the content of Lie’s first 
theorem. 

The change of an arbitrary function f(x) due to an infinitesimal transfor- 
mation is 


df = Ls dxj = oF aby = bax @ wa) 


~ Ox; OX; 
This suggests calling 


n 


F) 
, >. Hix > (29.25) 
i=1 


the infinitesimal generators of the Lie group. The commutator of two of 
these generators is 
ou jo ou jp 


we OX; ae Ox; 
rj Ll 


IX,.X,]= d (29.26) 


Ox i ; 

This commutator does not appear to be similar to the one in Defini- 
tion 29.1.28, which is necessary if the generators are to form a Lie alge- 
bra. However, through a long and tortuous manipulation, outlined in Prob- 
lem 29.9, one can show that 


[Xp, Xo] = cpg Xx (29.27) 


where c/,, are constants. 
One can also obtain this same result by the much simpler method of 


applying Proposition 29.1.34 to both sides of Eq. (29.12): 


[é)u. &)u] =—(8. au = -(» “] =-) ck Em. 
k=1 M k=1 


This equation is equivalent to (29.27) if we identify the X,’s with the 
(&;)’s and ignore the irrelevant minus sign. 

The reader has hopefully been able to appreciate the power and elegance 
of the geometric approach to Lie groups and Lie algebras. The above il- 
lustration (Problem 29.9) brings out the tedium and the error-prone proce- 
dure of obtaining group-theoretic results through coordinate manipulations, 
a procedure used in the old literature including the work of Sophus Lie him- 
self. Although such calculations are inevitable in practice, where most Lie 
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groups are given in terms of parameters, they are not suitable for obtaining 
formal results. 


Example 29.1.35 The two-dimensional rotation group SO(2) is a 1- 
parameter Lie group defined by 


X} = G1(x1, x2; 0) = x1 cos@ — x2 sind, 


X5 = G2(x1, x2; 0) = x1 sind + x2 cos 0. 


Using Eq. (29.25), we find the (only) generator of this group: 


0 0g; 
X=u;— where uj; = % ; 
Xj 00 6=0 
Explicitly, we have 
06) . 
uy = — = (—x, sin9 — x2 cos9)|g=9 = —X2, 
06 |g—0 
) 
e202 = (x; cos@ — x2 sin@)|g=9 = x1, 
98 |9=0 
and 
x 0 " 0 7) 4 0 
= u =—-X Xx e 
, Ox] 0x2 : Chal : 0x2 


The reader recognizes this, within a factor of i, as the z-component of 
the angular momentum operator in quantum mechanics. In fact, with p, = 
—10/0Xn, we have 


X=i(x1P2 — x2P;) =ib3 or L3=—iX 


where L3 is the third component of angular momentum operator r x p. 
Therefore, 


Box 29.1.36 Angular momentum operators are the infinitesimal gen- 
erators of rotation. 


Inclusion of the other two rotations about the x-axis and the y-axis com- 
pletes the set of infinitesimal generators of the rotation group in three dimen- 
sions. Let us obtain the commutation relation between these components. 
First we note that the x-and y-components can also be calculated as 

eee ge 
Oz dy 
and all the three results can be summarized as Xj = €jmnXm0/0Xn, with 
summation over the repeated indices understood. In terms of momentum, 
this becomes Xj = i€ jmnXmP, and Lj = —iXj = € jmnXmP,. The commu- 
tation relation among the components of angular momentum can now be 
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calculated 
[Lj si] = €jmn€krs [XmPp> XrPy] 
= €jmnEkrs (iy [Pn»Xr]Ps + xr Lm, P,IP,,) 
= €jmnEkrs (—id-nXmPs + 15smXrPp) 
= —1€ jmn€knsXmPxs + 1€ jmn€krmXrPy- 
Using 


€jmn€ksn = 5 jk5ms _ 5 jsSkm and Xj PK — XkPj = €jkm Lin, 
we obtain 
[L;, Ly]= i€ jkm Lin 


which is the desired result. 

We obtained the generators of rotation by obtaining each component sep- 
arately from equations connecting two x’s to two xs involving only one an- 
gle. We could have used three equations connecting the three x’s directly 
to the three xs. Such an equation writes each x’ in terms of the three xs 
and trigonometric functions of three angles, the Euler angles. The matrix 
connecting the two sets of coordinates is given in Example 5.2.7. Prob- 
lem 29.10, which the reader is asked to solve as a very illuminating exercise, 
calculates the three components of angular momentum directly. 


The action of a Lie group on M can be reconstructed from its infinitesi- 
mal action. The flow of X,. is the solution of the DE 
tf 

a = Vix (x), x, (0) = xj. (29.28) 

dt 
Once the solution is obtained, one can replace ¢ with a, for each x. In some 
applications, u;, (x) will be given implicitly in terms of certain parameters of 
integration of some DEs [unrelated to (29.28)]. The solution of these DEs 
are typically generators of coordinate transformations that can be written 
linearly in terms of the parameters. To be more precise, suppose that after 
solving some DEs, we obtain 


; 
Xi =D cin fe, An), (29.29) 
k=1 


where {c;,} are the parameters of integration, and X; are components of the 
vector field that generates the coordinate transformation. This means that 
for small parameters, one can write 


; 
CSS Ge Cis da) 


k=1 
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and read off uj, (x) =. Ww) (x). In that case, we have 


dx; @(,! / '()) = x; 
ap 8 Cees oF x; (0) = xj. (29.30) 


We shall have occasion to use this formula later. 


29.1.5 Integration on Lie Groups 


As any other manifold, one can define integration on Lie groups; i.e., one 
can construct nonvanishing n-forms and use Eq. (28.47) to define integrals 
on a Lie group G. Because of the left-invariant property of objects on G, it 
would be helpful if the integration process were also left-invariant. For this 
to happen, the n-form would have to be left-invariant. It turns out that this 
can be accomplished more or less uniquely: 


Proposition 29.1.37 Let G be a Lie group of dimension n. Then there ex- 
ists a left-invariant nonvanishing n-form qm that is unique up to a nonzero 
multiplicative constant. If G is compact, then qu is also right-invariant and 
the multiplicative constant can be chosen to be 1. 


Proof Let ft, be any nonzero n-form on J.(G). The desired n-form is the 
left translation of this form, i.e., Liat [L.. Indeed, let (Xi}i_) be left invari- 
ant. Then 


Me (Xilg, tee »Xnlg) = Li-iMe(Xi le, . -, Xnlg) 
= Me(Le-1,Xi lg, ees L-1,Xnlg) 


PC i es SS ieee: 


This shows that y is left-invariant. Now note that any other n-form mw’, on 
Je(G) is a constant multiple of 4. Therefore, the corresponding n-form K, 
will be a constant multiple of pg. 

Let x € G and consider px’ = R*. We have 


Lie =Lio Rip= Rl oLiw= Re R= BK, 


where we used the fact that L, and R, commute and that y is left invariant. 
The equation above shows that yw’ is also left-invariant. Therefore, w’ = cp. 
If G is compact, we can integrate both sides and note that /, Gkh= J G iM 
because qm’ is related to x by a change of variable. Therefore, c = 1 and 


Ryp =(L. 


The left-invariant volume element (nonvanishing n-form) guaranteed by 
the proposition above is called Haar measure. Since all calculations are 
done using some coordinate system, we give an explicit expression of the 
Haar measure in terms of coordinates (parameters) of a general Lie group. 
Let y = (y!,..., y”) be the coordinates of the translation of x = (x!,...,x”) 


Haar measure 
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by g €G. Then we can write y = m(g,x), so that dy/ = (dy//dx')dx! = 
(ami /ax')dx'. Therefore, 


dm/ (g, x) 


dy! A--+Ady" = det 
ox! 


Jae" A+++ A dx’. 

In particular, if x = 0, the coordinates of the identity, then y will be the 
coordinates of g. So, the volume element at g, denoted by d’ y, will be given 
by 


d'x. 


x=0 


ami (g, =) 


ax! 


dy= det( 


Note that this is consistent with the geometric definition of the invariant 
measure given in Proposition 29.1.37 because L,-1,. = Lis and the matrix 
of Le is the inverse of the matrix of L,,. The volume element at g, which 
is invariant on G—and therefore has the same value as at the identity—and 
which we denote by du(g), will be given by 


d’g, (29.31) 


ax! x=0 


du(g) =du(e) =d'x =det! ae) 


where we have replaced y with the more suggestive g. The volume element 
d" g is the ordinary Euclidean volume element of R” evaluated at the param- 
eters corresponding to g. The quantity multiplying d’ g is called the density 
function. Note that since we are interested in the derivatives of m/ at small 
values of x, we can take the components of x to be small, and retain them 
only up to the first order. This will sometimes simplify the calculation of the 
invariant Haar measure. 


Example 29.1.38 From the multiplication rule for the one-dimensional 
projective group given in Example 29.1.9, we easily find 


ay 0 a2 0 
0 a 0 a2 


= det = (ayaa — ana3)*. 
a. Ome 6 (a, a4 — a2a3) 


a, O a4 


Thus the density function is (a,a4— a7a3)~*, and the invariant Haar measure 
is 


du(a) = (aja4 — ana3)~*d4a. 


29.2 An Outline of Lie Algebra Theory 


The notion of a Lie algebra has appeared on a number of occasions both in 
our study of vector fields on manifolds and, more recently, in the study of 
Lie groups in the vicinity of their identity elements. Lie algebras play an 
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important role in the representation theory of Lie groups as well. It is there- 
fore worth our effort to spend some time getting acquainted with the formal 
structure and properties of these algebras. We shall restrict our discussion to 
finite-dimensional Lie algebras. 


Definition 29.2.1 A finite-dimensional vector space V over R (or C) is 
called a Lie algebra over R (or C) if there is a binary operation, called 
Lie multiplication, [-,-]:V x V — V on V, satisfying 


1. [X,Y] =—[Y, X] for all X, Y € V (antisymmetry). 
2. [aX+ BY, Z] =a[X, Z] + BLY, Z] for a, 6 € R (or C) (linearity). 
3. [X, [Y, Z]]+[Z, [X, Y]]+[Y,[Z,X]]=0 (Jacobi identity). 


The concepts of a homomorphism, its kernel, its range, etc. are the same as 
before. 


To distinguish Lie algebras from vector spaces, we shall denote the for- 
mer by lowercase German letters as we have done for the Lie algebras of 
Lie groups. 


Example 29.2.2 Recall from Chap. 3 that an algebra is a vector space with 
a product. If this product is associative, then one can construct a Lie algebra 
out of the associative algebra by defining [a, b] = ab — ba. In particular, 
the matrix algebra under commutation of matrices becomes a Lie algebra, 
which we denote by gl(n, R) [or gl(m, C)]. 


Definition 29.2.3 Let v be a Lie algebra. A subspace u of bv is called a 
subalgebra if [X, Y] © u whenever X, Y € u. The subspace u is called an 
ideal if [X, Y] € u whenever either X € u or Y € u. The center 3 of v is 
the collection of all X € v whose Lie multiplication with all vectors of b 
vanishes. A Lie algebra is abelian, or commutative, if 3 = v. 


If we choose a basis in the Lie algebra bv, and express the Lie multipli- 
cation of basis vectors as a linear combination of basis vectors, we end up 
with basis-dependent structure constants that satisfy Eq. (29.13). The struc- 
ture constants completely determine the Lie algebra: Given these constants, 
one can choose a vector space V of correct dimension, a basis in that space, 
and impose the Lie multiplication law among the basis vectors suggested by 
the structure constants. Once the Lie multiplication law for basis vectors is 
established, the law for arbitrary vectors follows from linearity of Lie mul- 
tiplication. This procedure induces a binary operation on V and turns it into 
a Lie algebra v. Any other algebra so constructed will be isomorphic to v. 


Example 29.2.4 We can classify all two-dimensional Lie algebras by ana- 
lyzing their structure constants. Let X; and Xz be any two linearly indepen- 
dent vectors of the two-dimensional Lie algebra vb. Write the only nonzero 
Lie bracket as 


[X1, Xo] = cy X] + c2X. 
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There are two cases to consider: Either c) = 0 = c2 or at least one of the 
constants is nonzero. The first case corresponds to a 2-dimensional abelian 
Lie algebra: 


[X;,X;]=0 fori, j=1,2. 
For the second case, suppose c; 4 0 and define the vectors 
X=c)X;, +c2Xo, Y=X)/c). 


Then the nonzero Lie bracket becomes [X, Y] = X. 


The result of Example 29.2.4 is summarized as follows: 


Box 29.2.5 There are only two 2-dimensional Lie algebras given by 
either one of the following nonzero Lie bracket relations: 


[X1,X2]=0 or [X),Xo.]=X). 


Example 29.2.6 The Pauli spin matrices 


oC a) 64) 
1 0 i O 0 -il 
form a Lie algebra under the commutation relation given by 

oj, OK] = 2i€ jx07. 
Thus, chy = 2i¢ jx). Pauli spin matrices are a basis for su(2). 


Example 29.2.7 The Lie group GL(n, R) has gl(n, R), the set of all real 
n Xn matrices, as its Lie algebra. The standard basis of this Lie algebra, also 
called the Weyl basis, consists of matrices e;; that have zeros everywhere 
except at the ijth position. We therefore have 


(Cif ki = Sik db j1- (29.32) 


We can readily find the Lie multiplication (commutation relations) for these 
matrices. We simply need to look at the elements of the matrix of the com- 
mutator: 


(Lei. ext]) nn = (€ij&k1)mn _ (x1; mn 
= (Cif mr (€xi)rn — (Ckt) mr (Gif )rn 
= Sim jr Okr OIn _ Skm Sir bird jn = 5im4 jkOIn _ dkm1i 9 jn 


= (j1)mn9 jk — (kj mn di» 
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or 
Lejj, exci] = 5 jxei7 — Si €K;- (29.33) 


The structure constants, which are naturally double-indexed, can be read off 
from Eq. (29.33): 


CIN) = 55nd IN — 55515", (29.34) 
where we have used a superscript for some of the Kronecker deltas to con- 


form to the position of the corresponding index on the LHS. 


Example 29.2.8 An important datum is the dimension of the Lie group (or 
its associated Lie algebra, since they are the same). This datum is not ap- 
parent in most cases of interest in which the group is defined in terms of 
some geometric property. For example, the symplectic group is defined as 
all linear transformations A that leave a certain antisymmetric bilinear form 
invariant (Example 23.2.2). In terms of matrices, we have 


xx =x'Ix 3 x ATAx=x'Ux Wx e R2”, i= (2 i): 
It follows that the symplectic group consists of all matrices A such that 
AJA=J. (29.35) 
If we write A in block form, 
All a 
A= ; 
& A22 


where Aj; are n x n matrices, then, Eq. (29.35) becomes 
Ai, AY ( 0 ) a fs = ( 0 ) 
Ain AS) —1 0 Ar Ar2 —1 O/’ 


Ay, A21 — AS) Alt, AS, A12 = AinA22, Ai A22 _ AS, A12 =; 
(29.36) 
For the symplectic algebra sp(2n, IR), we are interested in the matrix A 
when it is close to the identity. This means that 


or 


Ay=1+€Xy1, Ao = 14+ €X22, Ai2 = €Xj2, A21 = €X2). 


Substituting these in Eq. (29.36) and keeping terms linear in €, we obtain 
the following relations among X;;: 


X,=—Xi, Xt =Xio, Xb) = Xar- (29.37) 


It follows that we need n” parameters to describe the n x n matrices X11 
and X22 simultaneously. For the symmetric matrices X12 and X2;, we need 
n(n + 1)/2 independent parameters each. Therefore, the total number of 


939 


940 


n-orthogonal matrices 


29 Lie Groups and Lie Algebras 


independent parameters needed for (or the dimension of) the symplectic al- 
gebra sp(2n, R) is 


gD 


2 
+ 
i 2 


n(2n + 1). 

Although our attempt is to give a formal discussion of the Lie algebras 
and their structure in this section, we shall do this with an eye to the even- 
tual utility of this discussion in a better understanding of the Lie algebras 
of Lie groups. To make the connection between the present formalism and 
the Lie algebras arising from Lie groups, we shall make heavy use of ma- 
trix groups, i.e., GL(n, R) [or GL(n, C)] and its subgroups. Equation (29.8) 
gives a method of finding the matrices of the algebra if those of the group 
are known: 


Box 29.2.9 Differentiate the matrix with respect to a parameter at 
the identity (where all parameters are set equal to zero) to find the 
matrix “in the direction” of that parameter. 


29.2.1 The Lie Algebras o(p,n — p) and p(p,n — p) 


Many of the Lie groups encountered in physical applications are special 
cases of the (pseudo) orthogonal group O(p,n — p) and its associated 
Poincaré group P(p,n — p). It is therefore worthwhile to study their Lie 
algebras in some detail. Introduce the diagonal matrix 


4 = diag(—1,-1,...,-1,1,1,...,1) 
ee 
p times n— p times 


and note that the (pseudo) orthogonal group O(p,n — p) consists of n x n 
matrices that leave the bilinear form x - x = x’ yx invariant for x ¢ R”. This 
means that the matrices A will have to satisfy 


A'nA=n => (detA)*=1. (29.38) 


Such matrices are called n-orthogonal. The fact that O(p, n — p) is a group 
and that ~! = y can be used to show that 


An! =n. (29.39) 


Example 29.2.10 (The Lorentz group) The group of the special theory of 
relativity is the full Lorentz group O(3, 1). This is the group of transforma- 
tions that leave the invariant length? 


Hak —x? — x2 — oe tx2= ae —x?— x2 - ei 


5It is common to label the time coordinate with index 0 rather than 4. We shall use this 
convention. 
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of a 4-vector (x1,%2,*3,x9 = ct) invariant. The (0,0)-components of 
Egs. (29.38) and (29.39) yield 


deg 05 Gy ey = 1, 

eee ae (29.40) 
499 — 401 — M2 — 493 = |. 
Either one of these equations implies that agg > 1 or apo < —1. Lorentz 
transformations for which agp > | are called orthochronous. Since det 1 = 
+1 and 199 = +1, the identity belongs to the subset consisting of transfor- 
mations with detA = +1 and app > 1. Such transformations form a sub- 
group of O(3, 1) called the proper orthochronous Lorentz transforma- 
tions, and have the property that they can be reached continuously from 
the identity. 

Depending on whether x-x > 0, x-x < 0, or x-x = 0, the vector x is called 
timelike, spacelike, or null, respectively. In the special theory of relativity 
R* becomes the set of events. At every event x the set R* is divided into 5 
regions: 


1. All events y = (y1, y2, y3, yo) to which one can go from x by material 
objects, with speed less than c, lie to the future of x, i.e., yo — xo > 0, 
and are timelike: 


(yo — x0)? > (1 — 1)? + (2 — x2)? + (93 — 23)?. 


They form a 4-dimensional subset of R* and are said to lie inside the 
future light cone. 

2. All events y = (y1, y2, y3, yo) to which one can go from x only by a 
light signal lie to the future of x, i.e., yop — x9 > 0, and 


(yo — x0)” — (1 — x1)? — (92 = x2)" — (93 -— 43)? = 0. 


They form a 3-dimensional subset of R* and are said to lie on the future 
light cone. 

3. All events y = ()1, y2, y3, yo) from which one can come to x by mate- 
rial objects, with speed less than c, lie in the past of x, i.e., x9 — yo > 0, 
and are timelike: 


(x0 — yo)? > (01 — yi)? + (x2 — ya)? + (x3 — y3)?. 


They form a 4-dimensional subset of R* and are said to lie inside the 
past light cone. 

4. All events y = ()1, y2, y3, yo) from which one can come from x only 
by a light signal lie to the past of x, i-e., x9 — yo > 0, and 


(xo — yo)” — (a1 — y1)? — (2 — y2)* — (3 — ya) =0. 


They form a 3-dimensional subset of R* and are said to lie on the past 
light cone. 

5. All events in the remaining part of R* form a 4-dimensional subset, are 
spacelike, and cannot be connected to x by any means. They are said to 
belong to elsewhere. 
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From a physical standpoint, future and past are observer-independent. 
Therefore, if y lies in or on the future light cone of x with respect to one 
observer, it should also do so with respect to all observers. Since observers 
are connected by Lorentz transformations, we expect the latter to preserve 
this relation between x and y. Not all elements of O(3, 1) have this property. 
However, the proper orthochronous transformations do. The details are left 
as a problem for the reader (see Problem 29.15). 


As a prototype of 7-orthogonal matrices, consider the matrix obtained 
from the unit matrix by removing the iith, ijth, jith, and jjth elements, 
and replacing them by an overall 2 x 2 matrix. The result, denoted by A“), 
will look like 


1 0) 0 0 
0 1 0 0) 0 
AGD = : ~ u q 
0 0 aji ajj 0 
00... O .. O 1... I 
This matrix will transform (x1, ...,x,) € R” according to 


X; =ajiXi +ajjxj, (no summation!) 


/ 
Xj = AjiXj + ajjXj, 


x, =X_ fork Ai, j- 


In order for A“/) to leave the bilinear form x‘ nx invariant, the 2 x 2 sub- 
matrix es 2) must be either a rotation (corresponding to the case where 
i,j < p ori, j > p), or a Lorentz boost® (corresponding to the case where 
i < p and j > p). In the first case, we have 


ai aij\ _ (cos@ —siné 
aji ajj)  \sinO  cosé }’ 
and in the second case 
ai aj\  ( cosh& —sinhé 
aji ajj)  \-—sinhé  coshé /’ 
where & = tanh~!(v/c) is the “rapidity”. 
The matrices of the algebra are obtained by differentiation at 9 = 0 (or 


€ = 0). Denoting these matrices by M;;, we readily find that for the case of 
rotations, Mj; has —1 at the ijth position, +1 at the jith position, and 0 


©The elementary Lorentz transformations involving only one space dimension. 
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everywhere else. For the case of boosts, Mj; has —1 at the ijth and the jith 
position, and 0 everywhere else. Both cases can be described by the single 
relation 


(Maj), = mi157 — 75157", Mig = —Mji- 


It is convenient to have all indices in the lower position. So, we multiply 
both sides by nx to obtain 


(Mii = NN jk — NjINik, Mj = —Mji- (29.41) 


We can use Eq. (29.41) to find the Lie multiplication (in this case, matrix 
commutation relations) for the algebra o(p, n — p): 


[Mij, Mar] = nikM je — niiM jx + 7 j1Mik — 7 jkMit. (29.42) 


The Lie group O(p,n — p) includes rotations and Lorentz transfor- 
mation. Another group with considerable significance in physics is the 
Poincaré group P(p,n — p), which includes translations’ in space and 
time as well. An element of P(p,n— p) transforms x € R” to x’. =Ax+u, 
where u is a column vector representing the translation part of the group. It 
is convenient to introduce matrices to represent these group operations. This 
is possible if we represent an element of R” as an (n + 1)-column whose 
last element is an insignificant 1. Then, the reader may check that a Poincaré 
transformation can be written as 


690 ws 


where A is the n x n matrix of O(p,n — p), and u is an n-dimensional 
column vector. 

The Lie algebra of the Poincaré group is obtained by differentiating the 
(n+ 1) x (n+ 1) matrix of Eq. (29.43). The differentiation of the matrix A 
will give o(p, n— p) of Eq. (29.42). The translation part will lead to matrices 
P; with matrix elements given by 


(Pi)*, = skort! => (Paar = inh. (29.44) 
These matrices satisfy the following Lie multiplication rules: 
[P;,P;]=9, [Mij, Px] = nikPj — njxPi.- 


It then follows that the full Poincaré algebra p(p,n — p) is described by 
the following Lie brackets: 


[Mij, Maz] = nikMji — niiMjx + njiMiz — njxMit, 
[Mij, Pel = nikPj — njePi, (29.45) 
[P;, Pj] =0. 


7One can think of the Poincaré group as a subgroup of the group of affine motions in 
which the matrices belong to O(p,n — p) rather than GL(n, R). 
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29.2.2 Operations on Lie Algebras 


Definition 29.2.11 Let v be a Lie algebra. A linear operator D: v —> 0 sat- 
isfying 


D[X, Y] = [DX, Y] + [X, DY] 


is called a derivation of v. 


Although the product of two derivations is not a derivation, their com- 
mutator is. Therefore, the set of derivations of a Lie algebra vb themselves 
form a Lie algebra ]y under commutations, which is called the derivation 
algebra. 

Recall that the infinitesimal generators of the adjoint action of a Lie group 
on its algebra were given by ad¢ [Eq. (29.18)]. We can apply this to a general 
Lie algebra v by fixing a vector X € b and defining the map adx :b > bv 
given by adx(Y) = [X, Y]. The reader may verify that adx is a derivation 
of » and that adrx y] = [adx, ady]. Therefore, the set a0» = {adx | X € b} is 
a Lie algebra, a subalgebra of the derivation algebra Dy of v, and is called 
the adjoint algebra of v. There is a natural homomorphism w : » > ady 
given by y(X) = adx whose kernel is the center of . Furthermore, ad, is 
an ideal of Dy. 


Example 29.2.12 We construct the matrix representation of the operators 
in the adjoint algebra of su(2) with Pauli spin matrices as a basis. From 


ad,, (01) = [01,0] =90 
we conclude that the first column of the matrix of ad,, is zero. From 
adg, (02) = [o1, 02] = 2103 


we conclude that the second column of the matrix of ad,, has zeros for the 
first two elements and 2: for the last. Similarly, from 


ad,, (03) = [o1, 03] = —2ia2 


we conclude that the third column of the matrix of ad,, has zeros for the first 
and third elements and —2i for the second. Thus, the matrix representation 
of adg, is 


00 O 
ado, =]0 0 —2i 
0 2% 0 


0 0 2i 0 -2i 0 
ad,={ 0 0 O}, ad,=[2 0 0 
21 0 0 0 0 0 


The reader may readily verify that [ad, 7 adg, | = 2i€ jx1adg,. 
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If w is an automorphism of bv, i.e., an isomorphism of bv onto itself, then 
ady(x) = poadxop ! VXev. (29.46) 


Since adx is an operator on the vector space v, one can define the trace of 
adx. However, the notion of trace attains a far greater significance when it is 
combined with the notion of composition of operators. For X, Y € v, define 


(X | Y) = tr(adx o ady). (29.47) 


Then one can show that (-|-) is bilinear and symmetric. It becomes an inner 
product if the “vectors” of the Lie algebra are hermitian operators on some 
vector space V, or if the underlying vector space is over R (see Proposi- 
tion 5.6.6 in Chap. 5). Furthermore, (-|-) satisfies 


(EX, ¥] | Z) + ([X, Z] | Y) =0. (29.48) 


Definition 29.2.13 The symmetric bilinear form (-|-) : v x » + C defined 
by (29.47) is called the Killing form of v. 


It is an immediate consequence of this definition and Eq. (29.46) that the 
Killing form of b is invariant under all automorphisms of bv. 

As noted above, the Killing form is an inner product if the Lie algebra 
consists of hermitian operators. This will certainly happen if the Lie alge- 
bra is that of a group whose elements are unitary operators on some vector 
space V. We shall see shortly that such unitary operators are not only pos- 
sible, but have extremely useful properties in the representation of compact 
Lie groups. A unitary representation of a Lie group induces a representation 
of its Lie algebra whose “vectors” are hermitian operators. Then the Killing 
form becomes an inner product. The natural existence of such Killing forms 
for the representation of a compact Lie group motivates the following: 


Definition 29.2.14 A Lie algebra v is compact if it has an inner product 
(-|-) satisfying 


(EX, Y] | Z) + (EX, Z] | Y) =0. 


k 


Choose a basis {X;} for the Lie algebra v and note that (adx; )' = Cj; 


Therefore, 

(X; | Xj) = tr(adx, o adx,) = (adx, )f (adx,)', = chin =gij, (29.49) 
where g;; are components of the so-called Cartan metric tensor in the basis 
{X;}. If A, B € v have components {a'} and {b‘} in the basis {X;}, then it 
follows from Eq. (29.49) that 

(A | B) =a‘b/ gi;, (29.50) 


as expected of a symmetric bilinear form. We can use the Cartan metric to 
lower the upper index of the structure constants: ¢;j~ = e jk: By virtue of 
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Eq. (29.49), the new constants may be written in the form 


ee eS lor l or\-s 
Cijk = Cif ClsCkr — (=€5 5h; r C511) ) Chr by (29.13) 


ee ee a lor os 
= CisCiCkr + Csi C1jCrk- 


The reader may now verify that the RHS is completely antisymmetric in 7, 
j, and k. If the Lie algebra is compact, then one can choose an orthonormal 
basis in which gj, = 6), (because the inner product is, by definition, positive 
definite) and obtain cf. = cijx. We therefore have the following result. 


Proposition 29.2.15 Let v be a compact Lie algebra. Then there exists a 
basis of 0 in which the structure constants are represented by a third-order 
completely antisymmetric covariant tensor. 


Historical Notes 

Most mathematicians seem to have little or no interest in history, so that often the name 
attached to a key result is that of the follow-up person who exploited an idea or theorem 
rather than its originator (the Jordan form is due to Weierstrass, Wedderburn theory to 
Cartan and Molien). No one has suffered from this ahistoricism more than Killing. For 
example, the so-called “Cartan sub-algebra” and “Cartan matrix” were defined and ex- 
ploited by Killing. He exhibited the characteristic equation of an arbitrary element of the 
Weyl group when Weyl was 3 years old and listed the orders of the Coxeter transformation 
19 years before Coxeter was born! 

Wilhelm Karl Joseph Killing (1847-1923) began university study in Miinster in 1865 
but quickly moved to Berlin and came under the influence of Kummer and Weierstrass. 
From 1868 to 1882 much of Killing’s energy was devoted to teaching at the gymnasium 
level in Berlin and Brilon (south of Miinster). At one stage, when Weierstrass was urging 
him to write up his research on space structures, he was spending as much as 36 hours per 
week in the classroom or tutoring. (Now many mathematicians consider 6 hours a week 
an intolerable burden!) On the recommendation of Weierstrass, Killing was appointed 
professor of mathematics at the Lyzeum Hosianum in Braunsberg, in East Prussia (now 
Braniewo in the region of Olsztyn, in Poland). This was a college founded in 1565 by 
Bishop Stanislaus Hosius, whose treatise on the Christian faith ran to 39 editions! The 
main object of the college was the training of Roman Catholic clergy, so Killing had to 
teach a wide range of topics, including the reconciliation of faith and science. Although he 
was isolated mathematically during his ten years in Braunsberg, this was the most creative 
period in his mathematical life. Killing produced his brilliant work despite worries about 
the health of his wife and seven children, demanding administrative duties as rector of 
the college and as a member and chairman of the City Council, and his active role in the 
Church of St. Catherine. 

What we now call Lie algebras were invented by the Norwegian mathematician Sophus 
Lie about 1870 and independently by Killing about 1880. Lie was seeking to develop 
an approach to the solution of differential equations analogous to the Galois theory of 
algebraic equations. Killing’s consuming passion was non-Euclidean geometries and their 
generalizations, so he was led to the problem of classifying infinitesimal motions of a rigid 
body in any type of space (or Raumformen, as he called them). 

In 1892 he was called back to his native Westphalia as professor of mathematics at the 
University of Miinster, where he was quickly submerged in teaching, administration, and 
charitable activities. He was Rector Magnificus for some period and president of the St. 
Vincent de Paul charitable society for ten years. Killing’s work was neglected partly be- 
cause he was a modest man with high standards; he vastly underrated his own achieve- 
ment. His interest was geometry, and for this he needed all real Lie algebras. To obtain 
merely the simple Lie algebras over the complex numbers did not appear to him to be 
very significant. Another reason was due to Lie, who was quite negative about Killing’s 
work. At the top of page 770 of a three-volume joint work of Lie and Engel we find the 
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following less than generous comment about Killing: “With the exception of the preced- 
ing unproved theorem ... all the theorems that are correct are due to Lie and all the false 
ones are due to Killing!” 

Killing was conservative in his political views and vigorously opposed the attempt to re- 
form the examination requirements for graduate students at the University of Miinster by 
deleting the compulsory study of philosophy. Engel comments “Killing could not see that 
for most candidates the test in philosophy was completely worthless.” He had a profound 
patriotic love of his country, so that his last years (1918-1923) were deeply pained by the 
collapse of social cohesion in Germany after the War of 1914-18. 

(Taken from A.J. Coleman, “The Greatest Mathematical Paper of All Times,’ Mathemat- 
ical Intelligencer 11(3) (1989) 29-38.) 


Example 29.2.16 We can calculate explicitly the Killing form of the Lie 
algebras gl(n, R) and sl(n, R). Choose the Wey] basis introduced in Exam- 
ple 29.2.7 and expand A,B € gl(n, R) in terms of the Weyl basis vectors: 
A=ail ejj,B= bil e;;. The Cartan metric tensor becomes 

Bik = CF an CEeg = (Bjm848% — 8:8, 55) (Bir8f" 82 — 8582" 3?), 
where we have used Eq. (29.34). It follows from these relations, Eq. (29.50), 
and a simple index manipulation that 


(A|B) =a b* g;; 41 = 2ntr(AB) — 2trAtrB (29.51) 
for A,B € gl(n, R), and 
(A | B) = 2n tr(AB) (29.52) 
for A, B € s(n, R), because all matrices in sl(n, IR) are traceless. 


A Lie algebra b, as a vector space, may be written as a direct sum of its 
subspaces. We express this as 


; 
b=uy Sy wv: Ov ur = > Ov. 
k=1 


If in addition {ux} are Lie subalgebras every one of which commutes with 
the rest, we write 


; 
p=u Ow O---Ou-=Duy (29.53) 
k=1 


and say that v has been decomposed into a direct sum of Lie algebras. In 
this case, each u, is not only a subalgebra, but also an ideal of » (see Propo- 
sition 3.2.11). 

The study of the structure of Lie algebras boils down to the study of the 
“simplest” kind of Lie algebras in terms of which other Lie algebras can 
be decomposed. Intuitively, one would want to call a Lie algebra “simple” 
if it has no proper subalgebras. However, in terms of decomposition, such 
subalgebras are required to be ideals. So the natural definition of a simple 
Lie algebra would be the following (see Definition 3.2.12): 
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semisimple Lie algebras Definition 29.2.17 A Lie algebra that has no proper ideal is called a simple 


Cartan subalgebra and 
the rank of a Lie algebra 


affine group 


Lie algebra. A Lie algebra is semisimple if it has no (nonzero) commutative 
ideal. 


For example, the pseudo-orthogonal algebra o(p, n — p) is semisimple, 
but the Poincaré algebra p(p, m — p) is not because the translation generators 
P; form a commutative ideal. 

A useful criterion for semisimplicity is given by the following theorem 
due to Cartan, which we state without proof (for a proof, see [Baru 86, 
pp. 15-16]): 


Theorem 29.2.18 (Cartan) A Lie algebra b is semisimple iff det(g;;) 4 0. 


The importance of semisimple Lie algebras is embodied in the following 
theorem: [Baru 86, pp. 19-20]. 


Theorem 29.2.19 (Cartan) A semisimple complex or real Lie algebra can 
be decomposed into a direct sum of pairwise orthogonal simple subalgebras. 
This decomposition is unique up to ordering. 


This is the analogue of Theorem 3.5.25. 

The orthogonality is with respect to the Killing form. Theorem 29.2.19 
reduces the study of semisimple Lie algebras to that of simple Lie algebras. 
What about a general Lie algebra v? If » is compact, then it turns out that it 
can be written as bv = 3 @ 5 where 3 is the center of b and s is semisimple. 
If » is not compact, then the decomposition will not be in terms of a direct 
sum, but in terms of what is called a semidirect sum one of whose factors is 
semisimple. For details, the reader is referred to the fairly accessible treat- 
ment of Barut and Raczka, Chap. 2. From now on we shall restrict our dis- 
cussion to semisimple Lie algebras. These algebras are completely known, 
because simple algebras have been completely classified. We shall not pur- 
sue the classification of Lie algebras. However, we simply state a definition 
that is used in such a classification, because we shall have an occasion to 
use it in the representation theory of Lie algebras. 


Definition 29.2.20 Let v be a Lie algebra. A subalgebra § of b is called 
a Cartan subalgebra if is the largest commutative subalgebra of b, and 
for all X € 6, if adx leaves a subspace of v invariant, then it leaves the 
complement of v invariant as well. The dimension of h is called the rank 
of v. 


29.3 Problems 
29.1 Show that the set G = GL(n, R) x R” equipped with the “product” 
(A, u)(B, v) = (AB, Av + u) 


forms a group. This is called the affine group. 


29.3. Problems 


29.2 Show that m:U x U — R defined in Example 29.1.5 is a local Lie 
group. 


29.3 Find the multiplication law for the groups in (b) and (c) of Exam- 
ple 29.1.9. 


29.4 Show that the one-dimensional projective group of Example 29.1.9 
satisfies all the group properties. In particular, find the identity and the in- 
verse of an element in the group. 


29.5 Let G be a Lie group. Let S be a subgroup of G that is also a subman- 
ifold of G. Show that S is a Lie group. 


29.6 Show that the differential map of w : GL(V) — HV), defined by 
W(A) = AA‘, where H(V) is the set of hermitian operators on V, is sur- 
jective. Derive Eq. (29.11). 

29.7 Verify that Ip = R;! o Lg is an isomorphism. 


29.8 Prove Proposition 29.1.24. 


29.9 Start with Eq. (29.24) and use the fact that second derivative is inde- 
pendent of the order of differentiation to obtain 


=0. 


Uix = 


—[86qt 26, 4 gat atin _ g—1 tix 
el 0a) ih day 


0a) day 


Now use the chain rule duj, /da, = (OUjx /OXj)(Ox;/0a,) and Eq. (29.24) 
to get 


a0-! aaa! Ou; Ou; 
ni ate ue }+[™ is —Uuj; v2 loa te! =0, 
j 


da, ay oa a | 
or 
Ou; Ou; 
ie aa —Ujr ae =X, (ain (x), (29.54) 
where 
a0-! ao! 
Cor (a) =| Fay, _ a |énothe 
LL 


Substituting Eq. (29.54) in Eq. (29.26) leads to (29.27). Now differentiate 
both sides of Eq. (29.54) with respect to dy to get 


With the assumption that the u;, are linearly independent, conclude that the 
structure “constants” are indeed constants. 
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29.10 Using 


X= G1(%1, X2, X33 0,9, V), 
X5 = G2(41, X2,2339,9,V), 
x3 = $3(x1, x2, x33 9,9, V), 
obtained from the multiplication of the column vector consisting of x1, x2, 


and x3 by the Euler matrix of Example 5.2.7 and employing Eq. (29.25), 
find the three components of the angular momentum. 


29.11 Find the invariant Haar measure of the general linear group in two 
dimensions. 


29.12 Show that the invariant Haar measure for a compact group satisfies 
du, =dp,-1. Hint: Define a measure v by dvyg = dp,-: and show that v 
is left-invariant. Now use the uniqueness of the left-invariant Haar measure 
for compact groups. 


29.13 Show that O(p,n— p) is a group. Use this and the fact that »~' = n 
to show that AjA! = 7. 


29.14 Show that the orthogonal group O(p,n — p) has dimension n(n — 
1)/2. Hint: Look at its algebra o(p,n — p). 


29.15 Let x = (x1, x2, x3, X09) be a timelike (null, isotropic) 4-vector with 
x9 > 0. Let A be a proper orthochronous transformation. Show that x’ = Ax 


is also timelike (null). Hint: Consider the zeroth component of x’ as an in- 
ner product of (x1, x2, x3, x9) and another vector and use Schwarz inequal- 


ity. 
29.16 Starting with the definition of each matrix, derive Eq. (29.45). 


29.17 Let D; and Do be derivations of a Lie algebra v. Show that Dj;D2 = 
D, o Dy» is not a derivation, but [D;, D2] is. 


29.18 Let b be a Lie algebra. Verify that adx is a derivation of v for any 
X € , and that adrx,y] = [adx, ady]. 


29.19 Show that yw: » > ady given by w(X) = ady is (a) a homomor- 
phism, (b) ker y is the center of v, and (c) ady is an ideal of Dy. 


29.20 Show that if w is an automorphism of v, then 
ady (x) = Poadxoy ! VXev. 


Hint: Apply both sides to an arbitrary element of v. 
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29.21 Show that for any Lie algebra, 


tor 38 I or os 
isCilCkr + CsiClj rk 


Cijk = Cis i 


is completely antisymmetric in all its indices. 


29.22 Show that the Killing form of » is invariant under all automorphisms 
of v. 


29.23 Show that the translation generators P; of the Poincaré algebra 
p(p,n — p) form a commutative ideal. 


29.24 Find the Cartan metrics for 0(3, 1) and p(3, 1), and show directly that 
the first is semisimple but the second is not. 


Representation of Lie Groups and Lie 3 O 


Algebras 


The representation of Lie groups is closely related to the representation of 
their Lie algebras, and we shall discuss them later in this chapter. In the 
case of compact groups, however, there is a well developed representation 
theory, which we shall consider in the first section. Before discussing com- 
pact groups, let us state a definition and a proposition that hold for all Lie 
groups. 


Definition 30.0.1 A representation of a Lie group G ona Hilbert space H{ 
is a Lie group homomorphism T : G > GL(J{). Similarly, a representation 
of the Lie algebra g is a Lie algebra homomorphism & : g > gI(J). 


The proposition we have in mind is the important Schur’s lemma which 
we state without proof (for a proof see [Baru 86, pp. 143-144]). 


Proposition 30.0.2 (Schur’s lemma) A unitary representation T : G > 
GL(H) of a Lie group G is irreducible if and only if the only operators 
commuting with all the T, are scalar multiples of the unit operator. 


30.1 Representation of Compact Lie Groups 


In this section, we shall consider the representation of compact Lie groups, 
because for such groups, many of the ideas developed for finite groups hold. 


Example 30.1.1 (Compactness of U(n), O(n), SU(n), and SO(n)) Identify 
GL(n, C) with R2” via components. The map 


f:GL(n,C) > GL(n,C) given by f(A) = AAT 


is continuous because it is simply the products of elements of matrices. It 
follows that f—!(1) is closed, because the matrix 1 is a single point in Rr 
which is therefore closed. f~!(1) is also bounded, because 


n n 
2 
AA! =1 => Y aijay; = Six => > lajj|" =n. 
j=l ij=l 
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Thus, f~!(1) is a (2n* — 1)-dimensional sphere of radius ./n in Rw , which 
is clearly bounded. The BWHB theorem (of Chap. 17) now implies that 
f —1(4) is compact. Now note that f —1(1) consists of all matrices that have 
their hermitian adjoints for an inverse; but these are precisely the set U(n) 
of unitary matrices. 

Now consider the map det : U(n) > C. This map is also continuous, 
implying that det~! (1) is aclosed subset of U(n). The boundedness of U (n) 
implies that det~!(1) is also bounded. Invoking the BWHB theorem again, 
we conclude that det~! (1) = SU(n), being closed and bounded, is compact. 

If instead of complex numbers, we restrict ourselves to the reals, O(n) 
and SO(n) will replace U(n) and SU(n), respectively. 


The result of the example above can be summarized: 


Box 30.1.2 The unitary U(n), orthogonal O(n), special unitary 
SU(n), and special orthogonal SO(n) groups are all compact. 


We now start our study of the representations of compact Lie groups. We 
first show that we can always assume that the representation is unitary. 


Theorem 30.1.3 Let T : G — GL(H) be any representation of the com- 
pact group G. There exists a new inner product in K relative to which T is 


unitary. 

Proof Let (|) be the initial inner product. Define a new inner product (|) by 
(u|v) = [ creuttevraie 

where d jg is the Haar measure, which is both left- and right-invariant. The 


reader may check that this is indeed an inner product. For every h € G, we 
have 


Trultiv) = f (TgTnu|TgTnv)dp, 
G 
= (Tgnu|Tgnv)dp, (because T is a representation) 
G 
ee (because jz, is right invariant) 


= (u|v). 


This shows that T;, is unitary for all h € G. 


From now on, we shall restrict our discussion to unitary representations 
of compact groups. 

The study of representations of compact groups is facilitated by the fol- 
lowing construction: 


30.1 Representation of Compact Lie Groups 


Definition 30.1.4 Let T : G — GL(X) be a unitary representation of the 
compact group G and |u) € 1 a fixed vector. The Weyl operator K,, asso- 
ciated with |) is defined as 


Ky = [tu Toul dit. (30.1) 


The essential properties of the Weyl operator are summarized in the fol- 
lowing: 


Proposition 30.1.5 Let T : G— GL(K) be a unitary representation of the 
compact group G. Then the Weyl operator has the following properties 


1. K, is hermitian. 
K,T, =T,K, for all g ¢ G. Therefore, any eigenspace of K, is an in- 
variant subspace of all T,.’s. 

3.  K,, is a Hilbert-Schmidt operator. 


Proof Statement (1), in the form (w|K,|v)* = (v|K,|w), follows directly 
from the definition. 

(2) From Tg fs |T,.u)(T,u| dp, = Ic IT.T,u)(T,u|dp,., the fact that T 
is a representation (therefore, T,T, = T,,), and redefining the integration 
variable to y = gx, we get 


Teka =f MyType dite ty = fou) Ts Touldpy, 
=d py 


where we used the left invariance of mw and the fact that T is a representation. 
Unitarity of T now gives 


T,K, = [ Tyu)(TtT,u| dey = 4 IT yu) (TyulT dpty = KyTe. 


(3) Recall that an operator A € £(5) is Hilbert-Schmidt if yo \|Ale;) ||? 
is finite for any orthonormal basis {|e;)} of FH. In the present case, we have 


K, le:) = / Tio) (Teles) dit. 
G 


Therefore, 


fo) 
>| K,,|e;) | 
i=l 


at (ei|Tyu)\ Tuldny)( IT. u)( Tau) di.) 


rh! (ei|Tyu) (Tyu|Tyu) Teuler) dpty dpty. 


If we switch the order of summation and integration and use 


ee) 


Yi Tule: (e[Tyu) = (Txu|Tyu), 


i=l 
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we obtain 


[o,@) 
DbKien!?= ff \tyutou Pde. any, 
GJG 


i=l 


and using the Schwarz inequality in the integral yields 


(oe) 
Silken? < ff nity cyultyindn, di, 


i=1 


=| f uw) wran, any (because rep. is unitary) 
GJG 


7 iat f dn, | dl pty = |jul4V2 <0, 
G G 


where Vg is the finite volume of G. 


Historical Notes 

Hermann Klaus Hugo Weyl (1885-1955) attended the gymnasium at Altona and, on 
the recommendation of the headmaster of his gymnasium, who was a cousin of Hilbert, 
decided at the age of eighteen to enter the University of Gottingen. Except for one year 
at Munich he remained at Gottingen, as a student and later as Privatdozent, until 1913, 
when he became professor at the University of Zurich. After Klein’s retirement in 1913, 
Weyl declined an offer to be his successor at Géttingen but accepted a second offer in 
1930, after Hilbert had retired. In 1933 he decided he could no longer remain in Nazi 
Germany and accepted a position at the Institute for Advanced Study at Princeton, where 
he worked until his retirement in 1951. In the last years of his life he divided his time 
between Zurich and Princeton. 

Weyl undoubtedly was the most gifted of Hilbert’s students. Hilbert’s thought dominated 
the first part of his mathematical career; and although later he sharply diverged from 
his master, particularly on questions related to foundations of mathematics, Weyl always 
shared his convictions that the value of abstract theories lies in their success in solving 
classical problems and that the proper way to approach a question is through a deep 
analysis of the concepts it involves rather than by blind computations. 

Wey] arrived at Gottingen during the period when Hilbert was creating the spectral the- 
ory of self-adjoint operators, and spectral theory and harmonic analysis were central in 
Weyl’s mathematical research throughout his life. Very soon, however, he considerably 
broadened the range of his interests, including areas of mathematics into which Hilbert 
had never penetrated, such as the theory of Lie groups and the analytic theory of num- 
bers, thereby becoming one of the most universal mathematicians of his generation. He 
also had an important role in the development of mathematical physics, the field to which 
his most famous books, Raum, Zeit und Materie (1918), on the theory of relativity, and 
Gruppentheorie und Quantenmechanik (1928), are devoted. 

Weyl’s versatility is illustrated in a particularly striking way by the fact that immediately 
after some original advances in number theory (which he obtained in 1914), he spent more 
than ten years as a geometer—a geometer in the most modern sense of the word, uniting 
in his methods topology, algebra, analysis, and geometry in a display of dazzling virtu- 
osity and uncommon depth reminiscent of Riemann. Drawn by war mobilization into the 
German army, Wey] did not resume his interrupted work when he was allowed to return to 
civilian life in 1916. At Zurich he had worked with Einstein for one year, and he became 
keenly interested in the general theory of relativity, which had just been published; with 
his characteristic enthusiasm he devoted most of the next five years to exploring the math- 
ematical framework of the theory. In these investigations Weyl introduced the concept of 
what is now called a linear connection, linked not to the Lorentz group of orthogonal 
transformations, but to the enlarged group of conformal transformations; he even thought 
for a time that this would give him a unified theory of gravitation and electromagnetism, 
the forerunner of what is now called gauge theories. 
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Weyl’s use of tensor calculus in his work on relativity led him to reexamine the basic 
methods of that calculus and, more generally, of classical invariant theory that had been 
its forerunner but had fallen into near oblivion after Hilbert’s work of 1890. On the other 
hand, his semiphilosophical, semimathematical ideas on the general concept of “space” 
in connection with Einstein’s theory had directed his investigations to generalizations of 
Helmholtz’s problem of characterizing Euclidean geometry by properties of “free mobil- 
ity.’ From these two directions Weyl was brought into contact with the theory of linear 
representations of Lie groups; his papers on the subject (1925-1927) certainly repre- 
sent his masterpiece and must be counted among the most influential works in twentieth- 
century mathematics. 

Based on the early 1900s works of Frobenius, |. Schur, and A. Young, Wey! inaugurated 
a new approach for the representation of continuous groups by focusing his attention on 
Lie groups, rather than Lie algebras. 

Very few of Weyl’s 150 published books and papers—even those chiefly of an expository 
character—lack an original idea or a fresh viewpoint. The influence of his works and of 
his teaching was considerable: He proved by his example that an “abstract” approach to 
mathematics is perfectly compatible with “hard” analysis and, in fact, can be one of the 
most powerful tools when properly applied. 

Weyl] was one of that rare breed of modern mathematician whose contribution to physics 
was also substantial. In an interview with a reporter in 1929, Dirac is asked the following 
question: “... I want to ask you something more: They tell me that you and Einstein are 
the only two real sure-enough high-brows and the only ones who can really understand 
each other. I won’t ask you if this is straight stuff, for I know you are too modest to 
admit it. But I want to know this—Do you ever run across a fellow that even you can’t 
understand?” To this Dirac replies one word: “Weyl.” 

Weyl had a lifelong interest in philosophy and metaphysics, and his mathematical activity 
was seldom free from philosophical undertones or afterthoughts. At the height of the 
controversy over the foundations of mathematics, between the formalist school of Hilbert 
and the intuitionist school of Brouwer, he actively fought on Brouwer’s side. His own 
comment, stated somewhat jokingly, sums up his personality: “My work always tried to 
unite the truth with the beautiful, but when I had to choose one or the other, I usually 
chose the beautiful.” 


We now come to the most fundamental theorem of representation theory 
of compact Lie groups. Before stating and proving this theorem, we need 
the following lemma: 


Lemma 30.1.6 Let T : G— GL(H) be an irreducible unitary representa- 
tion of a compact Lie group G. For any nonzero |u), |v) € FH, we have 


1 ; 2 
——_~ (v|T,|u) | dp, =c, (30.2) 
fal Tore gh TT ae 
where c > 0 is a constant independent of \u) and |v). 


Proof By Schur’s lemma and (2) of Proposition 30.1.5, K, = A(u)1. There- 
fore, on the one hand, 


(v|Ku|v) = Aw) ||vl?. (30.3) 


On the other hand, 


(Kyle) =f (oitau) Maule) dn, = f [vrtaen|Paas. G04 
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Moreover, if we use dw, = dh g-1 (see Problem 29.12), then 
(Kele) =f (oiRe tay ult) dag = foto (ult Io) di, 
= fut atoy(oitetar dns =f wertytoy(ort yal dy 


=f urtytey (oth) dys = (ull. 
G So ee 


=(Tyv|u) dp, 


This equality plus Eq. (30.3) gives 


A) Mw) 


A(u)|lv|l? = A(v) |u|? => = 
lv]? |laell? 


Since |u) and |v) are arbitrary, we conclude that A(u) = cl|u||? for all |u) € 
H, where c is a constant. Equations (30.3) and (30.4) now yield Eq. (30.2). 
If we let |u) = |v) in Eq. (30.4) and use (30.3), we obtain 


2 
/ [(ulTalu) | dpe = A(w) Ile? = ella? 
G 
That c > 0 follows from the fact that the LHS is a nonnegative continuous 


function that has at least one strictly positive value in its integration range, 
namely at x = e, the identity. 


Theorem 30.1.7 Every irreducible unitary representation of a com- 
pact Lie group is finite-dimensional. 


Proof Let {|e;)}/_, be any set of orthonormal vectors in H{. Then, uni- 
tarity of T, implies that {T,|e;)}_, is also an orthonormal set. Applying 
Lemma 30.1.6 to |e;) and |e), we obtain 


[NerttstenP an, =e 


Now sum over j to get 


n 


n 
ne= >of Keritslep Pan = f Yo [MerTxe;)|? day 
jai’ G 


j=l 
< | (elledn. = Vo, 
G 


where we used the Parseval inequality [Eq. (7.3)] as applied to the vector 
|e,) and the orthonormal set {T,|e;)}""_,. Since both Vg and c are finite, n 
must be finite as well. Thus, SH cannot have an infinite set of orthonormal 
vectors. 
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So far, we have discussed irreducible representations. What can we say 
about arbitrary representations? We recall that in the case of finite groups, 
every representation can be written as a direct sum of irreducible represen- 
tations. Is this also true for compact Lie groups? 

Firstly, we note that the Weyl operator, being Hilbert-Schmidt, is nec- 
essarily compact. It is also hermitian. Therefore, by the spectral theo- 
rem, its eigenspaces span the carrier space H{. Specifically, we can write 
H=Mpy ® pe ,; BM;, where Mo is the eigenspace corresponding to the 
zero eigenvalue of K,, and N could be infinity. 

Secondly, from the relation (v|K,,|v) = c||w||7 || v||? and the fact that c 4 0 
and |u) 4 0, we conclude that K, cannot have any nonzero eigenvector for 
its zero eigenvalue. It follows that Mo contains only the zero vector. There- 
fore, if H is infinite-dimensional, then N = oo. 

Thirdly, consider any representation T of G. Because K, commutes with 
all T,, each eigenspace of K,, is an invariant subspace under T.. If a subspace 
U is invariant under T, then UM M;, a subspace of M,, is also invariant 
(reader, please verify!). Thus, all invariant subspaces of G are reducible 
to invariant subspaces of eigenspaces of K,,. In particular, all irreducible 
invariant subspaces of T are subspaces of eigenspaces of K,,. 

Lastly, since all M; are finite-dimensional, we can use the procedure 
used in the case of finite groups and decompose Mj into irreducible invari- 
ant subspaces of 7. We have just shown the following result: 


Theorem 30.1.8 Every unitary representation T of a compact Lie 
group G is a direct sum of irreducible finite-dimensional unitary rep- 
resentations. 


By choosing a basis for the finite-dimensional invariant subspaces of T, 
we can represent each T, by a matrix. Therefore, 


Box 30.1.9 Compact Lie groups can be represented by matrices. 


As in the case of finite groups, one can work with matrix elements and 
characters of representations. The only difference is that summations are 
replaced with integration and order of the group |G| is replaced with Vg, 
which we take to be unity.! For example, Eq. (24.6) becomes 


/ T (g)xT® (¢') dt, = Ax dap 1, (30.5) 
G 


and the analogue of Eq. (24.8) is 


1 
ie Ti7'(8)T jm (8) dhe = —S nt api (30.6) 
a 


'This can always be done by rescaling the volume element. 
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Characters satisfy similar relations: Eq. (24.11) becomes 


[ x (g)xP*(g) dig = Sup. (30.7) 


and the useful Eq. (24.16) turns into 


[lec dng = om. (30.8) 


This formula can be used to test for irreducibility of a representation: If the 
integral is unity, the representation is irreducible; otherwise, it is reducible. 

Finally, we state the celebrated Peter-Weyl theorem (for a proof, see 
[Baru 86, pp. 172—173]) 


Theorem 30.1.10 (Peter-Weyl theorem) The functions 
viel (a), Va and 1<i,j<ng, 


form a complete set of functions in £7(G), the Hilbert space of 
square-integrable functions on G. 


If u € £2(G), we can write 


Na 
u(g)= >>> dT (g) where b% =nq i u(g)T;2™(g) dwg. 
Qa iJ 


(30.9) 


Example 30.1.11 Equation (30.9) is the generalization of the Fourier series 
expansion of functions. The connection with Fourier series becomes more 
transparent if we consider a particular compact group. The unit circle $! is 
a one-dimensional abelian compact 1-parameter Lie group. In fact, fixing an 
“origin” on the circle, any other point can be described by the parameter 0, 
the angular distance from the point to the origin. S! is obviously abelian; it is 
also compact, because it is a bounded closed region of R? (BWHB theorem). 
By Theorem 24.3.3, which holds for all Lie groups, all irreducible repre- 


sentations of S! are 1-dimensional. So Loe) — T(@). Furthermore, 


T® (6)T™ (6') = T (6 + 6’). Differentiating both sides with respect to 
6’ at 6’ = 0 yields the differential equation 


dT) 
do’ 


dT™ 
6’=0 dy 


dT™ 


T@(6) aa 
y=0 


y=O0+0’. 


=a 


The solution to this DE is Ae®’. Since T™ are unitary, and since a I- 
dimensional unitary matrix must look like e'?, we must have A = 1. Fur- 
thermore, 6 and 6 + 27 are identified on the unit circle; therefore, we must 
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conclude that a is i times an integer n, which determines the irreducible 
representation. We label the irreducible representation by n and write 


TM (6)=e", n=O0+1+2.... 


The Peter-Weyl theorem now becomes the rule of Fourier series expansion 
of periodic functions. This last property follows from the fact that any func- 
tion wu: S' — R is necessarily periodic. 


There are many occasions in physics where the state functions describing 
physical quantities transform irreducibly under the action of a Lie group 
(which we assume to be compact). Often this Lie group also acts on the 
underlying space-time manifold. So we have a situation in which a Lie group 
G acts on a Euclidean space R” as well as on the space of (square-integrable) 
functions £(R”). Therefore, the functions {9 ®)}, belonging to the ath 
irreducible representation transform among themselves not only because of 
the index i, but also because of the argument x € R”. 

To see the connection between physics and representation theory, con- 
sider the transformation of the simplest case, a scalar function. As a concrete 
example, choose temperature. To observer O at the corner of a room 8 me- 
ters long, 6 meters wide, and 3 meters high, the temperature of the center of 
the room is given by 0(4, 3, 1.5) where 0(x, y, z) is a function that gives O 
the temperature of various points of the room. Observer O’ is sitting in the 
middle of the floor, so that the center of the room has coordinates (0, 0, 1.5). 
O’ also has a function that gives her the temperature at various points. But 
this function must necessarily be different from 6 because of the different 
coordinates the same points have for O and O’. Calling this function 6’, we 
have 6’(0, 0, 1.5) = 0(4, 3, 1.5), and in general, 


6'(x', y’, z') = 0(x, y, 2), 


where (x’, y’, z’) describes the same point for O’ that (x, y, z) describes 
for O. 

In the context of representation theory, we can think of (x’, y’, z’) as the 
transformed coordinates obtained as a result of the action of some group: 
(x', y’,z')=g-(x, y, z), or x’ = g- x. So, the equation above can be written 
as 


@’(x’) =0(x) =0(g'-x’) or 6/(x)=6(g7!-x). 


It is natural to call 6’ the transform of 6 under the action of g and write 6’ = 
T,6. This is one way of constructing a representation [see the comments 
after Eq. (24.1)]. Instead of g~! on the left, one could act with g on the 
right. 

When the physical quantity is not a scalar, it is natural to group together 
the smallest set of functions that transform into one another. This leads to 
the set of functions that transform according to a row of an irreducible rep- 
resentation of the group. In some sense, this situation is a combination of 
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(24.1) and (24.35). The reader may verify that 


Te!” (x) = DT (9) (x- 87!) (30.10) 


j=l 


defines a representation of G. 

We now use Box 29.1.31 to construct an irreducible representation of 
the Lie algebra of G from Eq. (30.10). By the definition of the infinitesimal 
action, we let g = exp(&t) and differentiate both sides with respect to ¢ at 
t = 0. This yields 
TO! = {T\"? (exp(&t))\ (x exp(—&0)) } 
ae 10 Sy dt 

=D Dj )o\ 


t=0 


a) 
igs (exp(ér)) 


$;° (x-exp(—0)) 


=x 


t=0 


’ 


t=0 


+ Yr rt (exp(60)) = <6 (x - exp(—&1)) 
ean | 


=6; 


ji 

where we have defined the matrices D ;;(&) for the LHS. The derivative in 
ats (a) ; 

the first sum is simply { ji (&) the representation of the generator & of the 


1-parameter group of transformations in the space of functions {po}. The 
derivative in the second sum can be found by writing x’(t) = x - exp(—&r) 
and differentiating as follows: 


d a / d a ! m a d I 
7% (x’()) Le 7? Cas () ea) = Ho" 5") 


t=0 t=0 
=9 (a) k ¢ ical @) yk -&)=9 CO xv :&) 
= KP; UX “ie KP; xXxo)= vj x, $), 


where we used Eq. (29.23) and defined X*(x; &) by the last equality. We 
also changed the coordinate index to Greek to avoid confusing it with the 
index of the functions. Collecting everything together, we obtain 
No Mo 
>i; = 3 TO EO (x) + 3 8), X" 


j=l j=l j=l 


or, since gs ye (8/86), 


a a v a) 
Di) = 5/9 Gd leo — 9) + 5;;X GBs (30.11) 


where X" (x; &) is the vth component of the infinitesimal generator of the ac- 
tion induced by & € g. We shall put Eq. (30.11) to good use when we discuss 
symmetries and conservation laws in Chap. 33. The derivative with respect 
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to the functions, although meaningless at this point, will be necessary when 
we discuss conservation laws. 


30.2 Representation of the General Linear Group 


GL(V) is not a compact group, but we can use the experience we gained in 
the analysis of the symmetric group to find the irreducible representations 
of GL(V). The key is to construct tensor product spaces of V—which, as the 
reader may verify, is a carrier space of GL(V)—and look for its irreducible 
subspaces. In fact, if r is an arbitrary positive integer, T: G— GL(V) isa 
representation, and 


V8" =VQ@---@V, 
—— 
r times 


then T®” : G > GL(V®"), given by 


[T?"(g)]@1,.--, Ve) = TP’ (v1, Vr) = Te(¥1) ++ @T (vr), 


is also a representation. In particular, considering V as the (natural) carrier 
space for GL(V), we conclude that T®” : GL(V) > GL(V®") is a represen- 
tation. 

This tensor product representation is reducible, because as is evident 
from its definition, i preserves any symmetry of the tensor it acts on. For 
example, the subspace of the full m’-dimensional tensor product space— 
with n being the dimension of V—consisting of the completely symmetric 
tensors of the type 


t= > Vir(1) © Vx(2) @ +++ @ Var) 


TES; 


is invariant. Similarly, the subspace consisting of the completely antisym- 
metric tensor products—the r-fold wedge products—is invariant. 

To reduce V®", we choose a basis {ex}p_ , for V. Then the collection of 
n’ tensor products {ex, ®---@ex,}, where each k; runs from | to n, is a basis 
for V®". An invariant subspace of V®" is a span of linear combinations of 
certain of these basis vectors. Since the only thing that distinguishes among 
{ex, ®--- @ ez, } is a permutation of the r labels, we start to see the connec- 
tion between the reduction of V®" and S,. This connection becomes more 
evident if we recall that the left multiplication of the group algebra of S, 
by its elements provides the regular representation, which is reducible. The 
irreducible representations are the minimal ideals of the algebra generated 
by the Young operators. 

The same idea works here as well: Certain linear combination of the basis 
vectors of V®" obtained by permutations can serve as the basis vectors for 
irreducible representations of GL(V). Let us elaborate on this. Recall that 
a Young operator of S, is written in the form Y = QP where Q and P 
are linear combinations of permutations in S,. Y has the property that if 
one operates on it (via left multiplication) with all permutations of S;, one 
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generates a minimal ideal, i.e., an irreducible representation of S,. Now let 
Y be a Young operator that acts on the indices (k,,...,;), giving linear 
combinations of the basis vectors of V®". From the minimality of the ideal 
generated by Y and the fact that operators in GL(V) permute the factors in 
ex, ®--- @ ex, in all possible ways, it should now be clear that if we choose 
any single basis vector eg, ® --- ® ex,, then ¥(ex, ®--- @ ex,) generates an 
irreducible representation of GL(V). We therefore have the following: 


Theorem 30.2.1 Let {ex};_, be any basis for V. Let Y = QP be the 
Young operator of S; that permutes (and takes linear combinations 
of) the basis vectors {ex, ® --- ® ex,}. Then for any given such basis 
vector, the vectors 


{T?”Y(ex, ®--- @ex,) | g € GL(V)} 


span an irreducible subspace of V®". 


A basis of such an irreducible representation can be obtained by taking 
into account all the Young tableaux associated with the irreducible represen- 
tation. But which of the symmetry types will be realized for given values of 
n and r? Clearly, the Young tableau should not contain more than 7 rows, 
because then one of the symbols will be repeated in a column, and the Young 
operator will vanish due to the antisymmetry in its column indices. We can 
therefore restrict the partition (A) to 


(A) = (1, A2,..-5 An), Atte +An =r, Ay 2 dAge-++ > An =O. 
Let us consider an example for clarification. 


Example 30.2.2 First, let n =r = 2. The tensor product space has 27 = 4 
dimensions. To reduce it, we consider the Young operators, which corre- 
spond to e + (kj, k2) and e — (ki, k2). Let us denote these operators by 
Y; and Y2, respectively. By applying each one to a generic basis vector 
ex, ® ex,, we can generate all the irreducible representations. The first oper- 
ator gives 


Yi (ex, © ex) = ex, @ ex, + ex, © ex,, 


where k; and k2 can be | or 2. For kj = k2 = 1, we get 2e; ®e;. Fork; = 1, 
ky = 2, or ki = 2, kp = 1, we get e; @e2 + e€2 @e). Finally, for kj = kz = 2, 
we get 2e2 ® e2. Altogether, we obtain 3 linearly independent vectors that 
are completely symmetric. 

When the second operator acts on a generic basis vector, it gives 


Y2(e, @ xy) = ek, @ Ck, — Ck @ ey. 


The only time that this is not zero is when k; and kz are different. In ei- 
ther case, we get +(e; @ e2 — e2 @ e€;). This subspace is therefore one- 
dimensional. 


30.2 Representation of the General Linear Group 


The reduction of the tensor product space can therefore be written as 
V&? = Span{e; @ e1, e1 @ e2 + 2 Bei, 2 @er} 
a 
3D symmetric subspace 
® Span{e; ® er — e2 @ei}. 
een Sg enn 
1D antisymmetric subspace 


Next, let us consider the case of n = 2, r = 3. The tensor product space 
has 2? = 8 dimensions. To reduce it, we need to consider all Young opera- 
tors of 53. There are four of these, corresponding to the following tableaux: 


ky] ka} kg kj || ko k1||_ks k; 
k3 ky ky 
kg 


Let us denote these operators by Y1, Y2, ¥3, and Y4, respectively. By apply- 
ing each one to a generic basis vector ex, © ex, ® ex,, we can generate all 
the irreducible representations. The first operator gives 


Vi (Cx, @ Ck, @ €k;) = Ck, @ Ck, @ Ck, + Ck, @ Ck, @ ky + €k, @ Ck, @ ky 
+ ek, & Ck, @ Ck, + €x; W Ck, @ Cx, 
+ €k, © Ck, © Ek, 


where kj, k2, and k2 can be | or 2. For kj = ky = k3 = 1, we get 6e; @ e; © 
e,. For the case where two of the k;’s are | and the third is 2, we get 


2(e] @ e; @ e2 +e] @ €2 Be; +2 Ve; @e)). 
For the case where two of the k;’s are 2 and the third is 1, we get 
2(e] @ €p Ben +e) Me] Men + €2 Bey Wj). 


Finally, for kj = ko = k3 = 2, we get 6e2 ® e2 © e2. Altogether, we obtain 4 
linearly independent vectors that are completely symmetric. 


When the second operator acts on a generic basis vector, it gives” 


Y2(€k, ® Ck, ® ex;) = [e — (ki, kz) |e + (ki, kz) (Ck, ® Ck @ €ks) 
= Ck, @ Ck, W Ck, + €k, @ Ck, W Ck, 
— Cx, B Cx, B Ck, — Cx, WB Cx, @ EK, 


If all three indices are the same, we get zero. Suppose kj = 1. Then kz can 
be | or 2. For kz = 1, we must set k3 = 2 to get eg Me; Me; —e€] Ber Vey. 
For kz = 2, we must set k3 = | to obtain e; © e2 ® ep — ep Me] @ eo. If we 


?When a symmetric group is considered as an abstract group—as opposed to a group 
of transformations—we may multiply permutations (keep track of how each number is 
repeatedly transformed) from left to right. However, since the permutations here act on 
vectors on their right, it is more natural to calculate their products from right to left. 
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start with kj = 2, we will not produce any new vectors, as the reader is urged 
to verify. Therefore, the dimension of the irreducible subspace spanned by 
the second Young tableau is 2. 

The action of the third operator on a generic basis vector yields 


Y¥3(€k, ® Ck, ® €x;) = [e — (ki, kz) |e + (k1, k3) ] (Ck, ® Ck, @ €ks) 
= Ck, @ Cx, @ Ck, + Ck, @ Ck, W Ck; 


— €x, B€z, Bex, — Cx, Bex, O EZ,. 


The reader may check that we obtain a two-dimensional irreducible repre- 
sentation spanned by e; ® e; @ e2 — €2 Me; @ e; and e; er Men — 2 ® 
€> Be). 

The fourth Young operator gives zero because it is completely antisym- 
metric in three slots and we have only two indices. The reduction of the 
tensor product space can therefore be written as 


783 = Span{Y| (ex, ® Ck, ® ex;)} ® Span{¥2(ex, ® Cky @ ex;)} 
SS ED et 
dim=4 dim=2 
® Span{Y3 (ex, Be, @ ex;)} : 


dim=2 


We note that the total dimensions on both sides match. 


There is a remarkable formula that gives the dimension of all irreducible 
representations of GL(V) (see [Hame 89, pp. 384-387] for a derivation): 


Theorem 30.2.3 Let V be an n-dimensional vector space, and VO | the 


irreducible subspace of tensors with symmetry associated with the partition 
(A) = (Aq,...,An). Then 


Dil, ...,In) 
D(n—1,n—2,...,0) 


dimv™ = 


,) 


where 1; =A; +n— j and D(x1,..., Xn) is as given in Eq. (25.3). 


30.3 Representation of Lie Algebras 


The diffeomorphism established by the exponential map (Theorem 29.1.21) 
reduces the local study of a Lie group to that of its Lie algebra. In this book, 
we are exclusively interested in the local properties of Lie groups, and we 


3We use the word “local” to mean the collection of all points that can be connected to the 
identity by a curve in the Lie group G. If this collection exhausts G, then we say that G 
is connected. If, furthermore, all closed curves (loops) in G can be shrunk to a point, we 
say that G is simply connected. The word “local” can be replaced by “simply connected” 
in what follows. 


30.3 Representation of Lie Algebras 


shall therefore confine ourselves to Lie algebras to study the structure of Lie 
groups. Recall that any Lie group homomorphism leads to a corresponding 
Lie algebra homomorphism [Eq. (29.7)]. Conversely, a homomorphism of 
Lie algebras can, through the identification of the neighborhoods of their 
identities with their Lie algebras, be “exponentiated” to a (local) homomor- 
phism of their Lie groups. This leads to the following theorem (see [Fult 91, 
pp. 108 and 119] for a proof). 


Theorem 30.3.1 Let G be a Lie group with algebra g. A represen- 
tation T : G > GL(XL) determines a Lie algebra representation T,, : 
g —> gl(H). Conversely, a Lie algebra representation © : g > gl(FC) 
determines a Lie group representation. 


It follows from this theorem that all (local) Lie group representations 
result from corresponding Lie algebra representations. Therefore, we shall 
restrict ourselves to the representations of Lie algebras. 


30.3.1 Representation of Subgroups of GL(V) 


Let g be any Lie algebra with basis vectors {X;}. Let a representation T map 
these vectors to {T;} € gl(H{) for some carrier space J{. Then, a general ele- 
ment X = )°; a; X; of g will be mapped to T = )°; a; T;. Now suppose that 
h is a subalgebra of g. Then the restriction of T to h provides a representa- 
tion of h. This restriction may be reducible. If it is, then there is an invariant 
subspace Jt, of I. It follows that 


(b\Tx|a)=0 WXeEh whenever |a) € KH; and |b) € Ht, 


where Tx = T(X). If we write Tx = 0; aT, then in terms of T;, the 
equation above can be written as 


dim g dim 
d- a (bITila) = Yo a7 =0 VKeEd, (30.12) 
al i=l 


(ba) _ 


where T = (b|T;|a) are complex numbers. Equation (30.12) states that 


i 


Box 30.3.2 If T, as a representation of h (a Lie subalgebra of g), 
is reducible, then there exist a number of equations that a; ~~ must 
satisfy whenever X € h. If T, as a representation of g, is irreducible, 
then no relation such as given in (30.12) will exist when X runs over 


all of g. 


This last statement will be used to analyze certain subgroups of GL(V). 
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Let us first identify GL(V) with GL(n, C). Next, consider GL(n, R), 
which is a subgroup of GL(n, C), and transfer the discussion to their re- 
spective algebras. If {X;} is a basis of gl(n, C), then an arbitrary element 
can be written as )),; a; X;. The difference between gl(n, C) and gl(n, R) is 
that the a;’s are real in the latter case; i.e., for all real values of {a}, the sum 
belongs to gl(n, IR). Now suppose that T is an irreducible representation of 
gl(n, C) that is reducible when restricted to gl(n, R). Equation (30.12) states 
that the function 


n2 
b 
f Zi --5 2,2) = Sian a) 
i=l 


vanishes for all real values of the z;’s. Since this function is obviously en- 
tire, it must vanish for all complex values of z;’s by analytic continuation 
(see Theorem 12.3.1). But this is impossible because T is irreducible for 
gl(n, C). We have to conclude that 7 is irreducible as a representation of 
gl, R). 

The next subalgebra of gl(n, C) we consider is the Lie algebra sl(n, C) 
of the special linear group. The only restriction on the elements of sI(n, C) 
is for them to have a vanishing trace. Denoting tr X; by t;, we conclude 
that X = )°,a;X; belongs to sl(n,C) if and only if )>,; ajt; = 0. Let 
Ue pene, 1") =|the C””. Then sl(n, C) can be characterized as the subspace 


consisting of vectors |a) € C”” such that (a|t) = 0. Such a subspace has 
n? — | dimensions. If any irreducible representation of gl(n, C) is reducible 
for s(n, C), then the set of complex numbers {a;} must, in addition, satisfy 
Eq. (30.12). This amounts to the condition that |a) be orthogonal to |x (24)) 
as well. But this is impossible, because then the set {|a), |r), |r ))} would 
constitute a subspace of C”” whose dimension is at least n? + 1: There are 
n? — 1 of |a)’s, one |f), and at least one |r”). Therefore, all irreducible 
representations of gl(n, C) are also irreducible representations of sl(n, C). 

The last subalgebra of gl(n,C) we consider is the Lie algebra u(n) of 
the unitary group. To study this algebra, we start with the Weyl basis of 
Eg. (29.32) for gl(n, C), and construct a new hermitian basis {X;;} defined 
as 


Xjj =ejj for all 7 =1,2,...,n, 
Xj =i (exj — ej) if k A j. 


A typical element of gl(n,C) is of the form Ky aj Xkj, where ax; are 
complex numbers. If we restrict ourselves to real values of a,;, then we 
obtain the subalgebra of hermitian matrices whose Lie group is the unitary 
group U(n). The fact that the irreducible representations of gl(n, C) will 
not reduce under u(7) follows immediately from our discussion concerning 
gl(n, R). We summarize our findings in the following: 


Theorem 30.3.3 The irreducible representations of GL(n,C) are 
also irreducible representations of GL(n, R), SL(n,C), U(n), and 
SU(n). 


30.3 Representation of Lie Algebras 


The case of SU(n) follows from the same argument given earlier that 
connected GL(n, C) to SL(n, C). 


30.3.2 Casimir Operators 


In the general representation theory of Lie algebras, it is desirable to label 
each irreducible representation with a quantity made out of the basis vectors 
of the Lie algebra. An example is the labeling of the energy states of a quan- 
tum mechanical system with angular momentum. Each value of the total 
angular momentum labels an irreducible subspace whose vectors are further 
labeled by the third component of angular momentum (see Chap. 13). This 
subsection is devoted to the generalization of this concept to an arbitrary Lie 
algebra. 


Definition 30.3.4 Let {: g — gl(H) be a representation of the Lie alge- 
bra g. A Casimir operator C for this representation is an operator that 
commutes with all Tx of the representation. 


If the representation is irreducible, then by Schur’s lemma, C is a mul- 
tiple of the unit operator. Therefore, all vectors of an irreducible invariant 
subspace of the carrier space F{ are eigenvectors of C corresponding to the 
same eigenvalue. That Casimir operators actually determine the irreducible 
representations of a semisimple Lie algebra is the content of the following 
theorem (for a proof, see [Vara 84, pp. 333-337]). 


Theorem 30.3.5 (Chevalley) For every semisimple Lie algebra g of 
rank* r with a basis {X;}, there exists a set of r Casimir operators 
in the form of polynomials in Tx, whose eigenvalues characterize the 
irreducible representations of g. 


From now on, we shall use the notation X; for Tx; . It follows from Theo- 
rem 30.3.5 that all irreducible invariant vector subspaces of the carrier space 
can be labeled by the eigenvalues of the r Casimir operators. This means that 
each invariant irreducible subspace has a basis all of whose vectors carry a 
set of r labels corresponding to the eigenvalues of the r Casimir opera- 
tors. 

One Casimir operator—in the form of a polynomial of degree two— 
which works only for semisimple Lie algebras, is obtained easily: 


C=) g!XiX;, (30.13) 
i,j 


where g’/ is the inverse of the Cartan metric tensor. In fact, with the sum- 
mation convention in place, we have 


Recall that the rank of g is the dimension of the Cartan subalgebra of g. 
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[C, Xe] = ¢/ GXj, Xe] = g [KiKj, Kel + 1X1, XIX} 
= a! {cl XK, +.),X,X} 
= gil Cie (X;X, + X-X;) (because gil is symmetric) 


= 8! 9°" ciks KjXr + XXj) 

=0 (because g'/ g°"cjxs is antisymmetric in j, r). 
The last equality follows from the fact that g’/ and g°” are symmetric, cjxs 
is completely antisymmetric [see the discussion following Eq. (29.50)], and 
there is a sum over the dummy index s. 


Example 30.3.6 The rotation group SO(3) in R? is a compact 3-parameter 
Lie group. The infinitesimal generators are the three components of the an- 
gular momentum operator (see Example 29.1.35). From the commutation 
relations of the angular momentum operators developed in Chap. 13, we 
conclude that cf, = i¢;jx. It follows that the Cartan metric tensor is 


8ij = Cilie = (i€isr) GE jrs) = +€isr€jsr = 26;j- 


Ignoring the factor of 2 and denoting the angular momentum operators by L;, 
we conclude that 


LD? =Li +5413 


is a Casimir operator. But this is precisely the operator discussed in detail 
in Chap. 13. We found there that the eigenvalues of L* were labeled by j, 
where j was either an integer or a half odd integer. In the context of our 
present discussion, we note that the Lie algebra s0(3) has rank one, because 
there is no higher dimensional subalgebra of s0(3) all of whose vectors com- 
mute with one another. It follows from Theorem 30.3.5 that L is the only 
Casimir operator, and that all irreducible representations T/) of s0(3) are 
distinguished by their label j. Furthermore, the construction of Chap. 13 
showed explicitly that the dimension of T) is 2j + 1. 

The connection between the representation of Lie algebras and Lie 
groups permits us to conclude that the irreducible representations of the 
rotation group are labeled by the (half) integers j, and the jth irreducible 
representation has dimension 27 + 1. When j is an integer / and the carrier 
space is £°(S*), the square-integrable functions on the unit sphere, then L? 
becomes a differential operator, and the spherical harmonics Yj, (6, g), with 
a fixed value of /, provide a basis for the /th irreducible invariant subspace. 


The last sentence of Example 30.3.6 is at the heart of the connection be- 
tween symmetry, Lie groups, and the equations of mathematical physics. 
A symmetry operation of mathematical physics is expressed in terms of 
the action of a Lie group on an underlying manifold M, i.e., as a group 
of transformations of M. The Lie algebra of such a Lie group consists of the 
infinitesimal generators of the corresponding transformation. These genera- 
tors can be expressed as first-order differential operators as in Eq. (29.25). 


30.3 Representation of Lie Algebras 


It is therefore natural to choose as the carrier space of a representation 
the Hilbert space £*(M) of the square-integrable functions on M, which, 
through the local identification of M with R” (m = dim M), can be identi- 
fied with functions on R”. Then the infinitesimal generators act directly on 
the functions of £2(M) as first-order differential operators. 

The Casimir operators {Cy}’,_,, where r is the rank of the Lie algebra, 
are polynomials in the infinitesimal generators, i.e., differential operators of 
higher order. On the irreducible invariant subspaces of £7(M), each Cy acts 
as a multiple of the identity, so if f(r) belongs to such an invariant subspace, 
we have 


Cf) =A@FO), w=1,2,...,7. (30.14) 


This is a set of differential equations that are invariant under the symmetry 
of the physical system, 1.e., its solutions transform among themselves under 
the action of the group of symmetries. 

It is a stunning reality and a fact of profound significance that many of 
the differential equations of mathematical physics are, as in Eq. (30.14), ex- 
pressions of the invariance of the Casimir operators of some Lie algebra in a 
particular representation. Moreover, all the standard functions of mathemat- 
ical physics, such as Bessel, hypergeometric, and confluent hypergeomet- 
ric functions, are related to matrix elements in the representations of a few 
of the simplest Lie groups (see [Mill 68] for a thorough discussion of this 
topic). 


Historical Notes 

Claude Chevalley (1909-1984) was the only son of Abel and Marguerite Chevalley who 
were the authors of the Oxford Concise French Dictionary. He studied under Emile Picard 
at the Ecole Normale Supérieur in Paris, graduating in 1929 and becoming the youngest 
of the mathematicians of the Bourbaki school. 

After graduation, Chevalley went to Germany to continue his studies under Artin at Ham- 
burg during the session 1931-1932. He then went to the University of Marburg to work 
with Hasse, who had been appointed to fill Hensel’s chair there in 1930. He was awarded 
his doctorate in 1937. A year later Chevalley went to the Institute for Advanced Study at 
Princeton, where he also served on the faculty of Princeton University. From July 1949 
until June 1957 he served as professor of mathematics at Columbia University, afterwards 
returning to the University of Paris. 

Chevalley had a major influence on the development of several areas of mathematics. His 
papers of 1936 and 1941 led to major advances in class field theory and also in algebraic 
geometry. He did pioneering work in the theory of local rings in 1943, developing the 
ideas of Krull into a theorem bearing his name. Chevalley’s theorem was important in 
applications made in 1954 to quasi-algebraically closed fields and the following year to 
algebraic groups. Chevalley groups play a central role in the classification of finite simple 
groups. His name is also attached to Chevalley decompositions and to a Chevalley type 
of semi-simple algebraic group. 

Many of his texts have become classics. He wrote Theory of Lie Groups in three vol- 
umes which appeared in 1946, 1951, and 1955. He also published Theory of Distributions 
(1951), Introduction to the Theory of Algebraic Functions of one Variable (1951), The Al- 
gebraic Theory of Spinors (1954), Class Field Theory (1954), Fundamental Concepts of 
Algebra (1956), and Foundations of Algebraic Geometry (1958). 

Chevalley was awarded many honors for his work. Among these was the Cole Prize of the 
American Mathematical Society. He was elected a member of the London Mathematical 
Society in 1967. 
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30.3.3 Representation of s0(3) and sa(3, 1) 


Because of their importance in physical applications, we study the repre- 
sentations of so(3), the rotation, and s0(3, 1), the Lorentz, algebras. For 
rotations, we define Jj = —iMo3, Jo = iMj3, and J3 = —iMj2, and note 
that the J;’s satisfy exactly the same commutation relations as the angular 
momentum operators of Chap. 13. Therefore, the irreducible representations 
of s0(3) are labeled by j, which can be an integer or a half-odd integer (see 
also Example 30.3.6). These representations are finite-dimensional because 
SO(3) is a compact group (Example 30.1.1 and Theorem 30.1.7). The di- 
mension of the irreducible representation of s0(3) labeled by j is 27 + 1. 

Because of local isomorphism of Lie groups and their Lie algebras, the 
same irreducible spaces found for Lie algebras can be used to represent the 
Lie groups. In particular, the states {|j m)y p where m is the eigenvalue 
of J,, can also be used as a basis of the j-th irreducible representation. 

The flow of each infinitesimal generator of $0(3) is a one-parameter sub- 
group of SO(3). For example, exp(Mj2¢) is a rotation of angle g about the 
z-axis. Using Euler angles, we can write a general rotation as 


R(y, 6, ¢) = exp(My2) exp(M31 8) exp(Mj29). 


The corresponding rotation operator acting on the vectors of the carrier 
space is 


RW, 8, g) = exp(My2 1) exp(M31 0) exp(My2g) = eel»? el? 


The rotation matrix corresponding to the above operator is obtained by 
sandwiching R(wW, 0, g) between basis vectors of a given irreducible repre- 
sentation: 


DY. (U.8, 9) = im" IR, 8, g)],im) = (jm! jel ee F| jm) 
= el’ V iM? ( tm!|e'4 | 7m) — gro me) gO) a), 


(30.15) 


Thus, the calculation of rotation matrices is reduced to finding ae (0). 
These are given by the Wigner formula (see [Hame 89, pp. 348-357]): 


(i) 6 2(j-)+m—n’ 6 2u+m’—m 
drm (0) = >) (i,m, m'; 1) (cos >) (sin >) 
bh 


(30.16) 
where 
[i +m)!( —m)!G +m’) —m)\]'? 
(i +m — w)lulGj — m! — w)\m! —m + p)! 


o(j,m,m'; w) = (-1" 


and the summation extends over all integral values of for which the fac- 
torials have a meaning. The number of terms in the summation is equal to 
1+ 1, where t is the smallest of the four integers j +m, j +m’. 


5Sometimes we use Jy, Jy, and J, instead of J), Jz, and J3. 
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From the rotation matrices, we can obtain the characters of the rotation 
group. However, an easier way is to use Euler’s theorem (Theorem 6.6.15), 
Example 23.2.19, and Box 24.3.6 to conclude that the character of a rotation 
depends only on the angle of rotation, and not on the direction of the rotation 
axis. Choosing the z-axis as our only axis of rotation, we obtain 

j i a 
xg) = > (jmle!*?| jm) = > eit? — o—tiv es, eiitmy 


nm=—j m=] m=—j 
2i i(2j+1 
—ijy ae et 
=e oe et =e V¥ —______ 
e'? —] 
k=0 
ithe = ee sin(j + 5) 
ei? —] ~ sin(g/2) ° 


(30.17) 


Equation (30.17) can be used to obtain the celebrated addition theo- 
rem for angular momenta. Suppose that initially we have two physical sys- 
tems corresponding to angular momenta j; and j2. When these systems 
are made to interact with one another, the total system will be described 
by the tensor product states. These states are vectors in the tensor product 
of the irreducible representations TY!) and T\) of the rotation group, as 
discussed in Sect. 24.8. This product is reducible. To find the factors into 
which it reduces, we consider its character corresponding to angle g. Using 
Eq. (24.42), we have 


Jl 2 
x IPD (g) _— xW@) . x (9) _ > eim@ > ei2 


m=—-fi m2=— jr 


jl 2 
: ~ ei mitma)y 


m\=—f, m2=—j2 


Jitj2 J Jit j2 
iM J 
- Y yaten x 
J=|ji—jo| M=-J J=|j1-j2! 


where the double sum on the third line is an equivalent way of writing the 
double summation of the second line, as the reader may verify. From this 
equation we read off the Clebsch-Gordan decomposition of the tensor prod- 


uct: addition theorem for 
fits angular momenta 
TW) @ TW = TO, (30.18) 
J=|f1—J2I 


which is also known as the addition theorem for angular momenta. Equa- 
tion (30.18) shows that (see page 753). 


Box 30.3.7 The rotation group is simply reducible. 
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The RHS of Eq. (30.18) tells us which irreducible representations result 
from multiplying 7“) and T 2). In particular, if j; = j2 =1, the RHS in- 
cludes the J = 0 representation, i.e., a scalar. In terms of the states, this says 
that we can combine two states with angular momentum / to obtain a scalar 
state. Let us find this combination. We use Eq. (24.46) in the form 


|JM) = S> CCA jo: Jimimz; M)|ji,mi; jo,m2), mi +m2=M. 
my,,m2 
(30.19) 
In the case under investigation, J = 0 = M, so (30.19) becomes 


1 
|00) = $° CU; 0|m, —m; 0)|Lm; 1, —m). 
m=—l 


Problem 30.9 shows that C(//; 0|m, —m; 0) = (—1)!~"/./2I-+ I, so that 


l 


_ (— 1)- m 
\00) = 5° FaAlimit m). 


Take the “inner product” of this with (0, g; 0’, y’| to obtain 


1 l—-m 
(0, ; 8’, y'\00) = yea = _ (9, 9: 8", g'|lm; 1, —m) 
n=— »V2a+7 


__4\l—m 
= Ss ( = (0, (9, elim) (9, ¢'L, =m), (30.20) 
——_——_—_———  ——S 


Yim (9,9) Y),—m (9'.9') 


where we have used (6, 9; 6’, @’| = (0, y|(8’, @’| and contracted each bra 
with a ket. We can evaluate the LHS of (30.20) by noting that since it is a 
scalar, the choice of orientation of coordinates is immaterial. So, let 0 = 0 
to get 0’ = y, the angle between the two directions. Then using the facts 


21+ 1 21+ 1 
Yim (0, @) = 5mo,/ —— and Yi9(6, y) = ,/ ——P,(cos@) 
4a 4a 
on the RHS of (30.20), we obtain 


NY 21+ 1Pi(cosy). 


Substituting this in the LHS of Eq. (30.20), we get 


(0, 9; 8’, g'|00) = 


An 
Pi(cosy) = 5 ae 1)" Yim (@, 9)¥1,-m(8’, 9’), 
m=—l 


which is the addition theorem for spherical harmonics discussed in Chap. 13. 
Let us now turn to s0(3, 1). We collect the generators in two categories 


M= (M,, M2, M3) = (M3, M31, Miz), 
N= (Nj, N2, N3) = (Mor, Moz, Mos), 
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and verify that 
[M;, Mj] = —€:jxMx, LNi. Nj] = €ijxMk, [M;, Nj] = —«ijx Nx, 


and that there are two Casimir operators: M* — N? and M-N. It follows that 
the irreducible representations of s0(3, 1) are labeled by two numbers. To 
find these numbers, define the generators 


J=—(M+iN), K= 


1 
2i 2i 
and show that 

Ji, Im] = €imk Jk; [Ki, Kj]=€ijmKm, Ji, Kj] =0. 


It follows that the J’s and the K’s generate two completely independent Lie 
algebras isomorphic to the angular momentum algebras and that s0(3, 1) isa 
direct sum of these algebras. Since each one requires a (half-odd) integer to 
designate its irreducible representations, we can choose these two numbers 
as the eigenvalues of the Casimir operators needed to label the irreducible 
representations of so(3, 1). Thus, the irreducible representations of s0(3, 1) 
are of the form 7), where j and j’ can each be an integer or a half-odd 
integer. 


30.3.4 Representation of the Poincaré Algebra 


The Poincaré algebra p(p, n — p), introduced in Sect. 29.2.1, is the gener- 
alization of the Lie algebra of the invariance group of the special theory of 
relativity. It contains the Lorentz, the rotation, and the translation groups as 
its proper subgroups. Its irreducible representations are of direct physical 
significance, and we shall study them here. 

As the first step in the construction of representations of p(p,n — p), we 
shall try to find its Casimir operators. Eq. (30.13) suggests one, but it works 
only for semisimple Lie algebras, and the Poincaré algebra is not semisim- 
ple. Nevertheless, let us try to find an operator based on that construction. 
From the commutation relations for p(p,n — p), as given in Eq. (29.45), 
and the double-indexed structure constants defined by,°® 


[Mij, Ma] = CijkMinn, [Mjj,Px]= Cij.kP ms 
we obtain 
mn __ smgn,. MmMgn,. mon , Mm on 7 
Cijad = 9; 8p Nik — 87 Se nit + 47" de Nj — 8)' 97 Njks 


m m m (30.21) 
Ciik bj nik — 9; Njk- 


6Please make sure to differentiate between the pair (Mj;, Px) (which acts on p) and the 
pair (M;;, P;), which acts on the state vectors in the Hilbert space of representation. 
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From these structure constants, we can construct a double indexed “metric” 


mn 
Sij,kl = Cie fan EE. rs + Cij, CH, r? 


which the reader may verify to be equal to 
Sij,kl = 2(n — WI) (nyjenit — Nik ji). 


There is no natural way of constructing a single-indexed metric. Therefore, 
we can only contract the M’s. In doing so, it is understood that the indices are 
raised and lowered by 7;;. So, the first candidate for a Casimir operator is 


M? = g3;,44MM" = 2(n — 1) (n jen — rien j) MYM" = —4(n — 1M Mi; 


The reader may verify that M? commutes with all the M’/’s but not with 
the P’’s. This is to be expected because M7, the total “angular momentum” 
operator’ is a scalar and should commute with all its components. But com- 
mutation with the P’’s is not guaranteed. 

The construction above, although a failure, gives us a clue for a success- 
ful construction. We can make another scalar out of the P’s. The reader may 
check that P? = n//P;P ; indeed commutes with all elements of the Poincaré 
algebra. We have thus found one Casimir operator. Can we find more? We 
have exhausted the polynomials of degree two. The only third-degree poly- 
nomials that we can construct are M‘/P; P; and niiM) M jpM", The first one 
is identically zero (why?), and the second one will not commute with the P’s. 

To find higher-order polynomials in the infinitesimal generators, we build 
new tensors out of them and contract these tensors with one another. For 
example, consider the vector 


C; = M;;P/ = nM; ;Px. (30.22) 


Then C’C;, a fourth-degree polynomial in the generators, is a scalar, and 
therefore, it commutes with the M;;’s, but unfortunately, not with P;’s. 

Another common way to construct tensors is to contract various numbers 
of the generators with the Levi-Civita tensor. For example, 


Wi ln 3 = elltn 3M Py (30.23) 


is a contravariant tensor of rank n — 3. Let us contract W with itself to find 
a scalar (which we expect to commute with all the Mj;’s): 


—_ iy ..in—3 : ‘ 
W-=W Wii, 3 


11 ..-1n—3 Jkl rspt 
Seis! ile sa P 


in-8 k r t 
=r Dien may Sarlay Ont _ay8acey Sets) Sn(eyMjePIMP 


= (=1)? 1 = 3)! Sep 5 (6) Ste) Mik PIMP, 
as 


7This “angular momentum” includes ordinary rotations as well as the Lorentz boosts. 
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where we used Eq. (26.45). The sum above can be carried out, with the final 
result 


W? = 2(-1)?(n — 3)!(M;;M'/ P? — 2¢;C’) 
= 2(-1)?(n — 3)!(M?P? — 2C’), (30.24) 


where C; was defined in Eq. (30.22). We have already seen that M2, P2, and 
C’ all commute with the M ;x’s. The reader may check that W? commutes 
with the P;’s as well. In fact, W--/n-3 itself commutes with all the P;’s. 
Other tensors and Casimir operators can be constructed in a similar fashion. 

We now want to construct the irreducible vector spaces that are labeled 
by the eigenvalues of the Casimir operators. We take advantage of the fact 
that the Poincaré algebra has a commutative subalgebra, the translation gen- 
erators. Since the P,’s commute among themselves and with P? and Ww, 
we can choose simultaneous eigenvectors of {Px}7_1, P?, and W’. In par- 
ticular, we can label the vectors of an irreducible invariant subspace by the 
eigenvalues of these operators. The P* and W? labels will be the same for 
all vectors in each irreducible invariant subspace, while the P,’s will label 
different vectors of the same invariant subspace. 

Let us concentrate on the momentum labels and let lWp ) be a vector in 
an irreducible representation of p(p,n — p), where p labels momenta and 
je distinguishes among all different vectors that have the same momentum 
label. We thus have 


Pilvp) = pelvp) fork =1,2,...,n, (30.25) 


where px is the eigenvalue of Py. We also need to know how the “rotation” 
operators act on vp ). Instead of the full operator eMiio" we apply its small- 
angle approximation 1+ M;;6'/. Since all states are labeled by momentum, 
we expect the rotated state to have a new momentum label, i.e., to be an 
eigenstate of P,. We want to show that (1 + M; jo Mvp ) is an eigenvector 
of Px. Let the eigenvalue be p’, which should be slightly different from p. 
Then, the problem reduces to determining 5p’ = p’ — p. Ignoring the index 


uu for a moment, we have 
Pil vp’) = Ppl Vp’) = (pe + Spx)(1 + Mij9") |p). 


Using the commutation relations between Px and M;;, we can write the LHS 
as 


LHS = Px (1+ Mj") | Wp) = [pe + 04 (Maj Pe + je Pi — nik Pj)|\Yp)- 
The RHS, to first order in infinitesimal quantities, can be expressed as 
RHS = (px + dpe + pO" Mij)| Wp). 
Comparison of the last two equations shows that 


Spe = 6 (njepi — nik pj) =O" (njenit — niknj Pp’ = 9" (Mij)ep' 
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where we used Eq. (29.42). It follows that 
p’ =p+ dp=(1+0"M,;)p, 


stating that the rotation operator of the carrier Hilbert space rotates the mo- 
mentum label of the state. Note that since “rotations” do not change the 
length (induced by 7), p’ and p have the same length. 

To obtain all the vectors of an irreducible representation of p(p,n — p), 
we must apply the rotation operators to vectors such as vp ). But not all ro- 
tations will change the label p; for example, in three dimensions, the vector 
p will not be affected® by a rotation about p. This motivates the following 
definition. 


Definition 30.3.8 Let po be any given eigenvalue of the translation genera- 
tors. The set Rp, of all rotations AP® that do not change po, is a subgroup of 
the rotation group O(p,n — p), called the little group corresponding to po. 
The little algebra consists of the generators Ne satisfying 


MP? Po = = 0. 


The significance of the little group resides in the fact that a representation 
of Rp, induces a representation of the whole Poincaré group. We shall only 
sketch the proof in the following and refer the reader to Mackey [Mack 68] 
for a full and rigorous discussion of induced representations. 

Suppose we have found an irreducible representation of Rp, with oper- 
ators APo. Let APPo be the rotation that carries’ po to p, i.e., p = APPopy. 
Consider any rotation A and let p’ be the momentum obtained when A acts 
on p, i.e., Ap = p’. Then 


AAPPopn, = AP Pop, => (APP)! AAPPop, = Po. 
eee 


ae ne 
This shows that (AP’Po)—! AAPPo belongs to the little group. So, 
( AP'Po) | AAPPo — APo 
for some APO € Rp,. Thus, A = AP’PoAPo(APPo)—! | and 
T(A) | Vp) = Alp) = AP POAPe (APP) ~'|yip)) = AP PAPO | Wp, 
= APP Y [Ton (AP) Wp) = 2, Ton (AP)AP?*| hp) 


= LTuul APO) | War) =D Tul APO) | ap): 


8The reader should be warned that although such a rotation does not change p, the rotation 
operator may change the state |p). However, the resulting state will be an eigenstate of 
the Px’s with eigenvalue p. 


°We are using the fact that O(p,n — p) is transitive (see Problem 30.15). 
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Note how the matrix elements of the representation of the little group alone 
have entered in the last line. We therefore consider 


A = AP’Po APo (APPo)—! | 


T(A)| Wp) = J Rvu(AP)| Wap}, wh 
ILM) = Ds RelA) [vag where an, 


(30.26) 
To avoid confusion, we have used R for the representation of the little group. 
We claim that Eq. (30.26) defines a (matrix) representation of the whole 
group. In fact, 


T(A1)T (A2)| Wh) = TAL) > Rou (AS°)|WRap) 
=D RonlAs) Do Rovl AY) Vasa) 
v p 


=D (LAA) Rol) ) WA 


=Royw (AVoAS?) since R is a rep. 


The reader may check that Ar abe = (A, A2)?0. Therefore, 


T(A1)T (Aa) | Wp) = >> Rou ((ArA2)”°) |W agp) = 7 (Ara) | Wp) 
p 


and T is indeed a representation. It turns out that if R is irreducible, then 
so is T. The discussion above shows that the irreducible representations of 
the Poincaré group are entirely determined by those of the little group and 
Eq. (30.25). The recipe for the construction of the irreducible representa- 
tions of p(p,n — p) is now clear: 


Theorem 30.3.9 Choose any simultaneous eigenvector po of the 
P,’s. Find the little algebra Ry. at po by finding all Mj;’s satisfy- 
ing M;;Po = 0. Find all irreducible representations of py,. The same 
eigenvalues that label the irreducible representations of tp, can be 
used, in addition to those of P2 and W?, to label the irreducible rep- 
resentations of p(p,n — p). 


We are particularly interested in p(3, 1), the symmetry group of the spe- 
cial theory of relativity. In applying the formalism developed above, we need 
to make contact with the physical world. This always involves interpreta- 
tions. Borrowing from the angular momentum theory, in which a physi- 
cal system was given the attribute of angular momentum, the label of the 
irreducible representation of the rotation group, we attribute the labels of 
an irreducible representation of the Poincaré group, i.e., the eigenvalues of 
the four translation generators and the two Casimir operators, to a physi- 
cal system. Since the four translation generators are identified as the three 
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components of momentum and energy, and their specification implies their 
constancy over time, we have to come to the conclusion that 


Box 30.3.10 An irreducible representation of the Poincaré group 
specifies a free relativistic particle. 


There may be some internal interactions between constituents of a (com- 
posite) particle, e.g. between quarks inside a proton, but as a whole, the com- 
posite will be interpreted as a single particle. To construct the little group, 
we have to specify a 4-momentum po. We shall consider two cases: In the 
first case, Po - Pp 4 0, whereby the particle is deduced to be massive and we 
can choose!? Po = (0, 0, 0, m). In the second case, po - Pp = 0, in which case 
the particle is massless, and we can choose pp = (p, 0, 0, p). We consider 
these two cases separately. 

The little group (really, the little Lie algebra) for pg = (0, 0, 0, m) is ob- 
tained by searching for those rotations that leave pg fixed. This is equivalent 
to searching for Mj;’s that annihilate (0, 0, 0, 7), namely, the solutions to 


(MijPo)2 = (Mij)ir(Po)’ = (Mij)iom =O = (Mij)i0o = 0. 


Since (Mij)i0 = ni0n jl — Njonit, We conclude that (Mj; );0 = 0 if and only if 
i ~Oand j £0. Thus the little group is generated by (M23, M31, Mj2) which 
are the components of angular momentum. The reader may also verify di- 
rectly that when the 4-momentum has only a time component, the Casimir 
operator W? reduces essentially to the total angular momentum operator. 
Since we are dealing with a single particle, the total angular momentum can 
only be spin. Therefore, we have the following theorem. 


Theorem 30.3.11 Jn the absence of any interactions, a massive rel- 
ativistic particle is specified by its mass m and its spin s, the former 
being any positive number, the latter taking on integer or half-odd- 
integer values. 


The case of the massless particle can be handled in the same way. We 
seek those M;;’s that annihilate (p, 0, 0, p), namely, the solutions to 
(Mi jPo)k = (Mij)kr(Po)” = (Mij)xop + (Mij x1 p = 0. 
The reader may check that 


(Mo1Po)k = 71k P — NOkP, (Mo2Po)k = 72k P, (Mo3Po)k = 73k P, 


(M23Po)x = 9, (Mi2Po)k = 72k P, (Mi3Po)k = 73k D- 


10We use units in which c = 1. 
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Clearly, M23 is one of the generators of the little group. Subtracting the 
middle terms and the last terms of each line, we see that Moz — Mj2 and 
Mo3 — Mj3 are the other two generators. These happen to be the components 
of W. In fact, it is easily verified that 


Ww? = W! =Mo3p, W? = 2p(M13 — Mos), 


3 (30.27) 
W> = 2p(Mo2 — Miz). 


Therefore, the little group is generated by all the components of W. Fur- 
thermore, W? has zero eigenvalue for |Wp,) when po = (p, 0, 0, p). Since 
both Casimir operators annihilate the state |yvp,), we need to come up with 
another way of labeling the states. 


Historical Notes 

Eugene Paul Wigner (1902-1995) was the second of three children born to Hungarian 
Jewish parents in Budapest. His father operated a large leather tannery and hoped that 
his son would follow him in that vocation, but the younger Wigner soon discovered both 
a taste and an aptitude for mathematics and physics. Although Wigner tried hard to ac- 
commodate his father’s wishes, he clearly heard his calling, and the world of physics is 
fortunate that he did. 

Wigner began his education in what he said “may have been the finest high school in 
the world.” He later studied chemical engineering and returned to Budapest to apply that 
training in his father’s tannery. He kept track of the seminal papers during the early years 
of quantum theory and, when the lure of physics became too strong, returned to Berlin to 
work in a crystallography lab. He lectured briefly at the University of Gottingen before 
moving to America to escape the Nazis. 

Wigner accepted a visiting professorship to Princeton in 1930. When the appointment 
was not made permanent, the disappointed young professor moved to the University of 
Wisconsin, where he served happily until his new wife died suddenly of cancer only a few 
months after their marriage. As Wigner prepared, quite understandably, to leave Wiscon- 
sin, Princeton corrected its earlier mistake and offered him a permanent position. Except 
for occasional visiting appointments in America and abroad, he remained at Princeton 
until his death. 

Wigner’s contributions to mathematical physics began during his studies in Berlin, where 
his supervisor suggested a problem dealing with the symmetry of atoms in a crystal. 
John von Neumann, a fellow Hungarian physicist, pointed out the relevance of papers 
by Frobenius and Schur on representation theory. Wigner soon became enamored with 
the group theory inherent in the problem and began to apply that approach to quantum 
mechanical problems. Largely at the urging of Leo Szilard (another Hungarian physicist 
and Wigner’s best friend), Wigner collected many of his results into the classic textbook 
Group Theory and Its Application to the Quantum Mechanics of Atomic Spectra. 

The decades that followed were filled with important contributions to mathematical 
physics, with applications of group theory comprising a large share: angular momen- 
tum; nuclear physics and SU(4) or “supermultiplet” theory; parity; and studies of the 
Lorentz group and Wigner’s classic definition of an elementary particle. Other work in- 
cluded early efforts in many-body theory and a paper on level spacings derived from the 
properties of Hermitian matrices that later proved useful to workers in quantum chaos. 
As with most famous figures, Wigner’s personality became as well known as his profes- 
sional accomplishments. His insistence on “reasonable” behavior, for instance, made him 
refuse to pay a relative’s hospital bill until after the patient was released—it was obviously 
unreasonable to hold a sick person hostage. His gentleness is exemplified in an anecdote 
in which on getting into an argument about a tip with a New York City cab driver, Wigner 
loses his patience, stamps his foot, and says, “Oh, go to hell, ... please!” 

He held others’ feelings in such high regard that it was said to be impossible to follow 
Wigner through a door. He was light-hearted and fun-loving, but also devoted to his family 
and concerned about the future of the planet. This combination of exceptional skill and 
laudable humanity ensures Wigner’s place among the most highly regarded of his field. 
(Taken from E. Vogt, Phys. Today 48 (12) (1995) 40-44.) 
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Define the new quantities 


. 1 
(W, iW), Ho = —Wo 


1 
Ay ia 
2 2p 


and the corresponding operators acting on the carrier space. From Eq. (30.27), 
it follows that [W,, W2]=0, W = —4H,H_, and that 


[H;, Ho] = —H+, [H+,Ho] =H_, [H,,H_]=0. 
Denote the eigenstates of W? and Ho by |a, B): 


W?|a, ) = ala, B), Ho|a, 8) = Bla, B). 


Then the reader may check that Hi|a, 6) has eigenvalues a and 6 + 1. By 
applying Hi repeatedly, we can generate all eigenvalues of Hp and note that 
they are of the form 


B=r+n, where n=0,+1,+2,... andl >r>0. 


Since Ho = M3, Ho is recognized as an angular momentum operator whose 
eigenvalues are integer (for bosons) and half-odd integer (for fermions). 
Therefore, r = 0 for bosons and r = 5 for fermions. 

Now, within an irreducible representation, only those |a, 6)’s can occur 
that have the same a. Therefore, if we relabel the 6 values by integers, then 


(a, n|Hola, m) = (F +1)dpm. 
Similarly, 


(a, n|Hy |e, m)= Qn6n,m+1 , 


(a, n|H_|a, m) = bndnm-1s 
where a, and b, are some constants. It follows that 


a = (a, n|W?|a, n) = (a, n|H,H_|a, n) 
= (a, n|H|a,n — 1)(a,n — 1|H_|a, n) 


=anbn. 


If we assume that the representation is unitary, then all W;’s will be hermi- 
tian, (H,)* = H_, so a, = b* and a = |a,|? > 0. 

If a = 0, then a, = 0 and by = 0 for all n. Consequently, H, = 0 = 
H_, i.e., there are no raising or lowering operators. It follows that there 
are only two spin states, corresponding to the maximum and the minimum 
eigenvalues of Ho. A natural axis for the projection of spin is the direction 
of motion of the particle. Then the projection of spin is called helicity. We 
summarize our discussion in the following theorem. 


30.4 Problems 


Theorem 30.3.12 In the absence of any interactions, a massless rel- 
ativistic particle is specified by its spin and its helicity. The former 
taking on integer or half-odd-integer values s, the latter having val- 
ues +s and —s. 


Theorems 30.3.11 and 30.3.12 are beautiful examples of the fruitfulness 
of the interplay between mathematics and physics. Physics has provided 
mathematics with a group, the Poincaré group, and mathematics, through 
its theory of group representation, has provided physics with the deep result 
that all particles must have a spin that takes on a specific value, and none 
other; that massive particles are allowed to have 2s + 1 different values for 
the projection of their spin; and that massless particles are allowed to have 
only two values for their spin projection. Such far-reaching results that are 
both universal and specific makes physics unique among all other sciences. 
It also provides impetus for the development of mathematics as the only di- 
alect through which nature seems to communicate to us her deepest secrets. 

If w > 0, then the resulting representations will have continuous spin 
variables. Such representations do not correspond to particles found in na- 
ture; therefore, we shall not pursue them any further. 


30.4 Problems 


30.1 Show that the operation on a compact group defined by 


(ulv) = [ creutteondie 
is an inner product. 
30.2 Show that the Weyl operator K,, is hermitian. 
30.3 Derive Eqs. (30.5) and (30.6). Hint: Follow the finite-group analogy. 


30.4 Suppose that a Lie group G acts on a Euclidean space R” as well as 
on the space of (square-integrable) functions £(IR”). Let o” transform as 
the ith row of the ath irreducible representation. Verify that the relation 


Te” (®) = DT; (9) (K-87!) 


j=l 


defines a representation of G. 


30.5 Show that GL(V) is not a compact group. Hint: Find a continuous 
function GL(V) — C whose image is not compact. 
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30.6 Suppose that T : G > GL(V) is a representation, and let 
vy" =V@---@V 
— —<—— 
r times 
be the r-fold tensor product of V. Show that T®” : G > GL(V®"), given by 
TH’ (V1, ---, Vr) =Tg (v1) @ ++ @T(v,), 


is also a representation. 


30.7 Suppose that in Example 30.2.2, we set kj = 2 for our treatment of 
n=2,r = 3. Show that Yo (ex, ® ex, ® ex, ) does not produce any new vector 
beyond what we obtained for k; = 1. 


30.8 Show that g/ g°"cixs is antisymmetric in j and r. 


30.9 Operate L, on 


1 
}00) = $° CUI; Olm, —m; 0)|1m; 1, —m) 


m=—I 


and use L,|00) = 0 to find a recursive relation among C(//; O|m, —m; 0). 
Use normalization and the convention that C(//; 0|m, —m; 0) > 0 to show 
that 


C(I; O|m, —m; 0) = ($1) "4/21 +1 
(see Sect. 13.3). 
30.10 Show that the generators of s0(3, 1), 
M= (M,, M2, M3) = (M23, M31, Mj), 
N= (M1, Nz, N3) = (Moi, Moz, Mos), 
satisfy the commutation relations 
[M;i, Mj) = —€ijxMxk, LNi, Nj] = €ijxMk, [M;, Nj] = —€ijxNx, 
and that M? — N? and M- N commute with all the M’s and the N’s. 


30.11 Let the double-indexed “metric” of the Poincaré algebra be defined 
as 


Sij,kl = CLs mn Crs + Gm Ck 
where the structure constants are given in Eq. (30.21). Show that 
Sij,kl = 2(n — Injen — NiKNjI)- 
30.12 Show that [M?, M‘/] = 0, and 


[M?, Px] =4M;;P/ + 2(n — 1)Px. 
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30.13 Show that the vector operator 
C; = M;jP! = 1 MijPx 
satisfies the following commutation relations: 
[C;, Pj] = jjP* — P{P;, (Ci, M jx] = nikCj — nije, 
[C;,C;] =MijP°. 
Show also that [C?, M jx] = 9, C'P; = 0, and 
Pi¢;=—(n—1)P?, — [C?, P;] = {2€; + (x — 1)P;} P?. 


30.14 Derive Eq. (30.24) and show that W!!“"-3 commutes with all the 
P;’s. 


30.15 Let €, = (x1,...,X) be any unit vector in R”. 


(a) Show that a matrix is n-orthogonal, i.e., it satisfies Eq. (29.38), if and 
only if its columns are y-orthogonal. 

(b) Show that there exists an A € O(p,n — p) such that é, = Aé, where 
é; = (1,0,..., 0). Hint: Find the first column of A and use (a). 

(c) Conclude that O(p,n — p) is transitive in its action on the collection 
of all vectors of the same length. 


30.16 Verify directly that when the 4-momentum has only a time compo- 
nent, the Casimir operator W7 = W - W reduces essentially to the total an- 
gular momentum operator. 


30.17 Verify that for the case of a massless particle, when pg = (p, 0, 0, p), 
Wo = W1 = Mp3, W2 = 2p(M13 — Moz), W3 = 2p(Mo2 — Mi2), 


and that W? = W - W annihilates |Wp,). 
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In Sect. 4.5, we discussed the representation of an algebra and its signifi- 
cance in physical applications. This significance is doubled in the case of the 
Clifford algebras because of their relation with the Dirac equation, which de- 
scribes a relativistic spin-5 fundamental particle such as a lepton or a quark. 
With the representation of Lie groups and Lie algebras behind us, we can 
now tackle the important topic of the representation of Clifford algebras. 


31.1 The Clifford Group 


Let Cy be a Clifford algebra. Denote by C - the set of invertible elements of 
Cy, which obviously form a group. Define a map ad : Cj, > GL(Cy) by 


ad(a)x = wy(a) Vxvar!, (31.1) 
where wy is the degree involution of Cy, and note that 
ad(ay V a7)x = wy (aj V an) VXV (a Van)! 
= wy (aj) V [wy (a2) Vx Vaz] Vay! 


= wy (aj) Vy Va; | =ad(aj)y = ad(aj) ad(ap)x. 


It follows that ad(a; V az) = ad(a) o ad(az), i.e., that ad is a group homo- 
morphism, thus a group representation. 


Definition 31.1.1 The representation defined in Eq. (31.1) is called the 
twisted adjoint representation. 


As shown in Problem 31.1, two immediate consequences of Eq. (31.1) 
are 


ad(wy (a)) = wy oad(a) o wy (31.2) 
and 
Gay gs, ye vnes, (31.3) 
; g(y.y) ~ 
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as in Eq. (26.35); i.e., that ad(y) is a reflection operator in V. Equation (31.2) 
follows from wy o wy =1, and (31.3) from Eq. (27.14). 


Proposition 31.1.2 kerad= Al, whereOAA€C. 


Proof That Al € kerad is trivial to show. To prove the converse, assume that 
ad(a) = 1. Then ad(a)x = /x = x, and therefore, 


Xx=oy(ayvxva! or wy(a)Vx=xva, forallxe@y. (31.4) 
In particular, with x = 1, we get wy (a) = a. Substituting this back in (31.4) 
yields aV x =x V a for all x € Cy. This, together with wy (a) = a implies 
that a € am It follows from Proposition 27.2.13 that a € Span{1} ora = 1. 
Since a is invertible, 4 4 0. 


Definition 31.1.3 Let vy be the set of ae cy such that ad(a)v € V 
for all v € V. 'y is a subgroup of Cy, and is called the Clifford group 
of V. 


Example 31.1.4 It is instructive to find the Clifford group of the simple ex- 
ample of complex numbers from first principles to get a handle on the more 
general case. We have V = R and {1, e} is a basis of the algebra.! In order 
to apply Eq. (31.1), we need a~!. Of course, we know from our knowledge 
of complex numbers what the inverse is. But, since we are starting from first 
principles, we find a~! from scratch. 

Let a =a1+ Be. We are looking for a’ = a'1+ f’e such that a’ va=1. 
This translates into 


(a1 + Be) v (a1 + fe) =1, 
or, since e V e= —1, into 
(aa = BB')1 + (ap’ +a'p)e= 1. 


It follows that wa’ — BB’ = 1 and af’ + a’ B = 0. At least one of the compo- 
nents of a, say a is nonzero. Hence, the second equation gives 8’ = —a’B/a. 
Substituting in the first equation, we get 


a 


1 UB eae 
a a2 + B2’ 


i 
aa 1 or @ 


from which we obtain 6’ = —B/(a? + f7), and 


_;  a@1—fBe 
a ==; 
a? + p2 


'We avoid using the normal notation for complex numbers to be more in tune with the 
notation of the Clifford algebras. So, instead of z, we use a and instead of i we use e. 
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a familiar result from complex number theory. The uniqueness of the inverse 
ensures that had we chosen § not to be zero, we would have obtained the 
same result. 

Now we try to find the elements of Cy, which by definition are those 
which leave V invariant. Let ne be an arbitrary element of V and recall that 
wy (1) = 1 and wy (e) = —e. Equation (31.1) now reads 


a1— fe 
(a1 — Be) Vv (ne) V (S45). 


which, after multiplying through, yields 
2ap a? — p? 
1 e}7. 
( © pet pe 1 
This will be in V (i.e., a will be in Py) for all values of 7 if either a = 0 or 
6 =0, ie., if a is real or pure imaginary. 


Equation (31.3) shows that any non-null (or non-isotropic) y € V (Le., a 
vector for which g(y, y) 4 0) is contained in the Clifford group of V. We 
shall see later that 'y is indeed generated by such elements. 


Proposition 31.1.5 Ty is stable under the two involutions wy and oy , and 
thus under the conjugation a+> a, where a= ovywy (a). 


Proof Letaé Ty, i.e., ad(a)x € V if x € V. Using (31.2), we have 
ad(wya)x = wy o ad(a) 0 wyx = —wy o ad(a)x = ad(a)x € V, 


because wy vV = —V for all ve V. Hence, wy (a) € Ty. 
Next we show that ova € I'y. We have 


ad(oya)x = (wyoya) VxXV (aya) | = (oywya) VxVv (aya_'), 


where we used the fact that wy and oy commute and that (oya)~! = 
oy(a_!), which is true for all involutions. Since oyx = x, the right-hand 
side can be written as 


RHS = (oywya) V oyx V oy(a') = oy[a! VxXVv yal, 
using Eq. (27.22). But 


—1 


ab 
a =oya 


=wy (wya"') =wy(oya)!, 
and denoting (wya)~! by b € Ty, we have 


RHS = oy[wyb vx v b |] = oy[ad(b)x] = ad(b)x € V. 


It follows that ad(oya)x € V,ie., ova eT y. 


We have defined conjugation for any element of Cy. Our experience with 
complex and quaternion conjugation leads us to believe that a V a is a non- 
negative real number. This does not hold for general a in a general Clifford 
algebra. However, we have the following: 
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Proposition 31.1.6 Let 6: Cy — Cy be defined by 0(a) =ava.lfaceTy, 
then 
O(a) =aVa=dqgl, Ag. 

Proof By Proposition 31.1.5, a € 'y. Let x be an arbitrary vector in V, and 
set y = ad(a)x. Then y € V and hence ovy = y. Spelling out this identity 
yields 

oy(wy(a) Vxva')=oy(a)Vvxva! 
or 


oy(a') Vxvoy(ya) = (ova)! VxVoy(@ya) =av(a)vxva |. 
This yields 
XV oy (wya) V a= oy(a) V wy (a) VX. 


From a= ovyoy(a), the fact that cy and wy commute, and that both are 
involutions, we get 


ovywy(a)=a and oy(a)=oay(a). 
Using these in the previous equation yields 
XVaVa=oy(ava)vx, XeEV. 
Thus, setting b=av a, we have 
xVb=ay(b)vx, xeV. 


Now write b = bp + bj, with bo € ce and bj € cl Then the equation 
above becomes 


x V bp +x V bj = (bo — by) Vx, 


and setting the odd and even parts of both sides equal yields 
XVbop=bo Vx and xVbj=—b, VX, xXeEV. 


Thus, bo € Ze and b; € rae From Propositions 27.2.10 and 27.2.13, we 
now conclude that bb = 4g1 and bj = O whence b=aV a= q1 and so 
0 (a) = Aq. Finally, since a is invertible, 4, cannot be zero. 


Example 31.1.7 In Example 31.1.4, we constructed the Clifford group 
of C. Although the construction seemed trivial, it featured most of the pro- 
cedure used in the general case. As a slightly more complicated example, let 
us find the Clifford group of H, the quaternions. In this case, V = R? with 
basis {e;, eo}. 

To simplify the writing, we sometimes remove the Clifford multiplication 
sign V, and use juxtaposition for product. We also use e;2 for e; V e2, in 
which case the basis of His written as {1, e;, €2, €;2}, of which the first and 
the last are even under the degree involution wy, and the middle two, odd. 


31.1. The Clifford Group 
For a = a1 + aye; + a2e2 + a3e12, and v = fe; + fre, Eq. (31.1) 
becomes 


— @je] — A2e€2 — A3€12 
a tay +a5 + 0% 


aol 
(ap1 — aye; — a2e2 + a312)(B1e] + f2e2) 


where we used Example 3.1.16 to come up with a7!, although we could 


have calculated it—after a straightforward, but tedious labor—using a tech- 
nique similar to the one used in Example 31.1.4. Using the notation |a|? = 
a6 + at + as + a3 and multiplying out the equation above (another tedious 
calculation), we obtain 

1 


2 
wy(ayVvVa = jaz Looms + @203)B1 + (ao@2 — a103)B2]1 


+ [(ag — af +05 — a3) Bi — 2(@102 + a3) B2]Je1 
+ [2(aoa3 — aa2)B1 + (a5 + af — a5 — 03) Boer. 


For this to be in V for arbitrary v, the coefficient of 1 must vanish for arbi- 
trary 6, and £2. This will happen iff 


apa; +a203=0 and agar —aja3=0. (31.5) 


At least one of the a@’s, say a1 is not zero. Then ag = —a203/a, from the 
first equation, which upon substitution in the second equation of (31.5), 
yields a3 (a; + a5) = 0, whose only solution is a3 = 0, and so ap = 0 as 
well. If we assume that ag is not zero, the solution will be aj] = 0 = ap. 
Hence, I'y consists of algebra members of the form 


aj=ayjey t+aren or ag=ag1+a3e})2. 


Note that wy (aj) = —a,, while wy (a2) = a2. This even-oddness was also 
true for the two choices of Example 31.1.4. 


Corollary 31.1.8 The map 0 of Proposition 31.1.6 satisfies 


O(wa)=O(a), aecly. 


Proof See Problem 31.2. 


Proposition 31.1.9 The map dy : Ty > C% given by hy (a) = Aq is a ho- 
momorphism from the Clifford group to C* (the multiplicative group of com- 
plex numbers). 


Proof To prove the homomorphism of Av, note that on the one hand, 6(a Vv 
b) = Ay(av b)1. On the other hand, 


6(aVb)=aVbvaVb=av(bvb)va 
=av (Ay(b)1) va=avay (Ay(b)1) 
= (Av(a)1) v (Av (b)1) = Ay (a)Ay (b)1. 
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It follows that 


Ay (av b) =Ay(a)Ay(b), 


i.e., that Ay is a homomorphism. 


Corollary 31.1.10 For04 BEC andaeéTy, the homomorphism rv of 
Proposition 31.1.9 satisfies 


1. Ay(Ba) = B?Ay(a). 
2. dAy(wa) = dAy(a). 
3. dAy(ad(b)a) = Ay (a). 


Proof The second relation follows from Corollary 31.1.8. The proof of the 
first and third relations is left as Problem 31.3. 


Equation (31.3) shows that ad(y) is the reflection operator in the plane 
normal to y and, therefore, an isometry. Is ad(a) an isometry for general a? 
The answer is yes, if a is restricted to the Clifford group. 


Proposition 31.1.11 Fix a¢ Vy and let tg be the restriction of ad(a) to V. 
Then Tq is an isometry of V. 


Proof Since x = —x for x € V, Eq. (27.14) gives 

6(x) = —g(x, x)1, 
from which it follows that 

Ay (x) = —g(x, x). (31.6) 
Now, the second relation in Corollary 31.1.10 yields 


9(TaX, TaX) = g(ad(a)x, ad(a)x) = —Ay (ad(a)x) = —Ay (x) = g(x, x), 


showing that t, is an isometry of V. 


Since ad is a representation, we can establish a group homomorphism 
between I'y and O(V), the group of isometries of V. In fact, we have the 
following: 


Proposition 31.1.12 Let dy : Ty > O(V) be defined by Sy (a) = 
Tq. Then ®y is a surjective homomorphism. 


Proof That ®y is a homomorphism is immediate. By Theorem 26.5.17, 
every isometry is a product of reflections. Therefore, surjection is proved 
if we can show that for any reflection in O(V), there is an element in 'y 
which is mapped by ®y to that reflection. But any reflection Ry in V is 
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just ad(y) with y being non-null, as observed in (31.3). Therefore, ®y (y) = 
Ty = ad(y) = Ry, proving that Sy is surjective. 


Theorem 31.1.13 The Clifford group Vy is generated by non-null 
vectors y € V. 


Proof Letaeé Ty and set t = Sy (a). By Theorem 26.5.17 and the fact that 
Ty =Ry foryeV, 


T=Ry, 0---ORy, = Ty, 0---0 Ty, =Oy(iV-:-Vyr), yieVv 
where g(y;, yi) 4 0. It follows that 
Dy(a! VyiVe Vy) = tira. 
But ®y (b) = ad(b). Hence, by Proposition 31.1.2, 
alvyiv---vy,=al, A40 


and 


a=(A-'y1) Vy2 V+" V yr, yieV. 


This completes the proof. 


Example 31.1.14 Example 31.1.7 showed that the elements of Ty for 
quaternions were of two kinds: 


aj=ajey +areo Or a =—a01 +0363. 


The first one is already in R*. To show that the second one is generated by 
vectors in R?, take two vectors u = nye; + n2e2 and v= é;e; + &e2 and 
note that 


uv v= —(1&1 + 2&2)1 + (77182 — 2é1)e12. 


We want this to be equal to a2. For that to happen, we should have 


7&1 + 22 = —a9 


1&2 — 21 = a3. 


Clearly, there are infinitely many solutions. One simple solution is 7; = 1, 
n2 = 0, &; = —ao, and & = a3. Then 


u=e, and v=—aoe, + 03€2 


are a pair of desired vectors. 
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Corollary 31.1.15 The homomorphism ®y satisfies 


det Py(a)-a=wy(a), aely. 


Proof Define the map ¢: Ty > Ty given by ¢(a) = det Sy (a) -a. It is not 
difficult to show that @ is a homomorphism and ¢(x) = wy (x) for x € V. 
Now apply Theorem 31.1.13. The details are left for Problem 31.4. 


Example 31.1.16 In this example, we find tz, and tz, associated with the 
elements of 'y for H as calculated in Example 31.1.7. For x = &e; + &2e2, 
we have 


Ta, X = ad(aj)x = wy (aj) VxVa;_ =-—a, VxVaj)- 


ae] — A2e2 
= — (a1 + a2€2)(§1e1 + §2€2)| ——z—5— 


2 
ay + a5 


= 2 2 
~ a +a3 [—(a7&1 + 2ayar&2 — a5&)e1 


+e (até — 2aja2& — at3&)es]. (31.7) 


If we represent x as a column vector, i.e., if we represent e; and e2 by 


1 0 
e, = (5) and e= (1) ; 


then t,, can be represented as a 2 x 2 matrix whose entries can be read off 
from the two last lines of Eq. (31.7): 


2 2 


bap famed _ 2otyot2 

at) a} 
fate ay rey 
Ta = O19 
_ 2aja2 aay 

Dit nf 252 

ay +ay ay +a5 


Similarly, 


aol — a3e12 
TayX = (1 + w3e12)(E1e1 + pe) (ee), 
ao + a3 


from which we obtain 


aj—a5 _ 20903 
ap tas ae +03 
Ta = 
2a9a3 a5 03 
ap-+az ap -+ay 
It is straightforward to show that dett,, = —1 and dett,, = 1. This, to- 
gether with wy (a;) = —a, and wy (a2) = ap, verifies the assertion of Corol- 


lary 31.1.15. 
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31.2 Spinors 


We now return to the Clifford algebra Cc (R). In order not to clutter sub- 
scripts and superscripts, we shall sometimes use the notation C(w, v) in- 
stead of Cis (R), it being understood that the underlying field is R. Let 
V = R’ and denote by (yz, v) the Clifford group of IR’. Recall from Propo- 
sition 31.1.9 the homomorphism Ay :F(uw, v) > R*. Obviously kerAy = 
{a eT (u,v) | Ay(a) = 1} is a subgroup of (yu, v). It turns out that, for 
applications in physics, a larger subgroup is more appropriate. 


Definition 31.2.1 The group Pin(j, v) is the subgroup of (yz, v) consist- 
ing of elements a satisfying Ay (a) = +1. 


Clearly, kerAy C Pin(jz, v), so 1 € Pin(, v), as it should. Now let x be 
any non-null vector in V. Then 


( xVXx ) ( 1 ) 
Av(-1) =Ay = Av(xV x) 
g(x, x) g(x, x) 


= (-4@) at 
g(x, x) 
where we used the first relation of Corollary 31.1.10 and Eq. (31.6). There- 
fore, —1 is contained in the kernel of Ay and thus also in Pin(,z, v). 
From Eq. (31.6) it follows that all vectors x € V for which g(x, x) = 


+1 are contained in Pin(j, v). Theorem 31.1.13 and the fact that Ay is a 
homomorphism, gives 


Proposition 31.2.2 The group Pin(u, v) is generated by x € V for which 
g(x,x) =+1. 


Let O(, v) be the group of isometries of IR} and consider the homo- 
morphism ®y :T(u, v) > O(w, v) introduced in Proposition 31.1.12. Let 
@ denote the restriction of Sy to Pin(w, v). Since ®y is surjective, for any 
t € O(u, v), there exists b € [(u, v) such that Sy (b) = tr. Now let 

b 


vlAv (b)| 


and use the first relation of Corollary 31.1.10 to obtain 


Ay (b) 
x = = +1 
v@™= ol 


This shows that a € Pin(w, v). Moreover, since ad(a) = ad(fa) for any 8 € 
IR% (a fact that follows immediately from the definition of the twisted adjoint 
representation), we have ®(a) = ®y (a) = Oy (b) = Tt. 

The equality (a) = ad(a) along with Proposition 31.1.2 implies that 
a € ker @ if and only if a= (1. The first relation of Corollary 31.1.10 gives 
Av (a) =A2. Sincea € Pin(j, v), we must have |Ay (a)| = 1, or A = +1 and 
a = +1. The discussion above leads to 
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Spin(0, 2) = U(1) 


31 Representation of Clifford Algebras 
Theorem 31.2.3 The map © : Pin(u, v) > O(u, v) defined by ®(a) = Tyg 


is a surjective group homomorphism with ker ® = {1, —1}. 


Definition 31.2.4 The even elements of Pin(jz, v) form a group de- 
noted by Spin(y, v). In other words, 


Spin(j, v) = Pin(jz, v) NC? (u, v). 


Since wy (a) =a for a € Spin(jz, v), Corollary 31.1.15 implies 
Proposition 31.2.5 Any aéT (yu, v) is in Spin(y, v) iff det Sy (a) = 1. 


The surjective homomorphism @ : Pin(, v) > O(, v) restricts to an- 
other homomorphism W : Spin(w, v) > SO(u, v): 


Theorem 31.2.6 The map W : Spin(u, v) > SO(u, v) defined by W(a) = 
Tq is a surjective group homomorphism. Furthermore, ker YW = {1, —1}. 


Example 31.2.7 We showed in Example 27.4.2 that H = C(O, 2). Thus, 
with the new notation, we can write 


I'(0, 2) = {a), a2} = {aje] + a2e2, 291 + a3e 1}. 


We now want to find Ag, and A,,. From the definition of 6 of Proposi- 
tion 31.1.6, we obtain 


Aa, 1 = (a1) =a] V a] = (ae) + @2e2)(—are] — a2e2) = (at +a3)1. 
It follows that Ag, = at + an, Similarly, Ag, = a + Oe 


If we divide a; by its length ,/|Aq;|, we obtain the elements of Pin(0, 2). 
These are 


a aye; + a2e2 


b, = = with Ap, = 1, 
V lAa,| o2 + 2 
yr 2 
a ag1 + a3e 
2S Hit AR = 1 


bo = = 
V lAag| a +03 


The group Spin(0, 2) consists of just bz since that is the only even element 
of Pin(0, 2). Those quaternions that are a linear combination of 1 and only 
one of the other three basis elements can be identified with C. We therefore 
conclude that Spin(0, 2) is the set of complex numbers of unit length, i.e., 
Spin(0, 2) = U(1) = {e'? |g ER}. 


31.2 Spinors 


31.2.1 Pauli Spin Matrices and Spinors 


We have already seen the connection between the Dirac equation of the rel- 
ativistic electron and Clifford algebras. We have also mentioned how the 
Dirac equation predicted the existence of the positron through the identifi- 
cation of two of the four components of the solution to the equation with 
the spin of the positron. The idea of spin came out very naturally from the 
Dirac equation. However, the concept was there before. Pauli had already 
developed (albeit in an ad hoc way) the theory of a spinning electron in non- 
relativistic quantum physics. He had represented the electron by a 2-column 
vector with complex entries, a spinor, to account for the two states (up and 
down) of an electron spin, and introduced certain 2 x 2 matrices—now bear- 
ing his name—in the equations that described the behavior of the electron 
in magnetic fields. In the following, we use Pauli’s matrices and spinors to 
motivate generalization to Clifford algebras. 

The fact that we are dealing with a complex 2-column points to the total 
matrix algebra }2(C), which is isomorphic to cc (R) as Table 27.2 shows. 
The generators of en (IR) are the basis vectors e1, €2, and e3 in R°, obeying 
the multiplication rule 


ej Ve; +e; Ve; =eje; + eje; =e;; +e;; = 26;;1. 


It is therefore instructive to find three 2 x 2 matrices which represent e1, e2, 
and e3, and therefore, obey the same rule. 

Referring to Eq. (27.25) and Example 27.2.6 with v = (a, 6, y), we want 
to find a relation of the form 


Qy1 O12) (11 a2 2 2 2\(1 0 
= (ae + po + , 
(o 0) & od aE PK) ( ') 
where a; ; are now complex numbers. Inspired by the solution in Eq. (27.27), 
we try 


olapn=(ye,, a) (31.8) 


and verify that indeed 
a p-iy oe Peay) 72. a al 2 
ee a Vasey ~o )=€ ae is (( i} 
For the basis vectors, Eq. (31.8) yields 


011.0,0=(5 8). 9.1.0=() 5). 


(0, 0,1) = (? a 


which are the three Pauli spin matrices used in the non-relativistic quantum 
physics of electrons. 


(31.9) 
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One of the consequences of an isomorphism between two algebras is the 
equality of their dimensions. We know that the dimension of Cc (R) is 8. In 
fact, a basis is given by 


{1, e], €2, €3, €12, €12, 23, €123}. (31.10) 


Labeling the three matrices of Eq. (31.9) by 01, 02, 03, respectively, we note 
that? 


0102 = 103, 0103 = —102, 0203 = 10}, 010203 =1/1. 
Hence, as a real algebra, the matrices 
{1, 01, 02, 03,11, 101,102, i103} 


form a basis of M2(C), as the reader can verify directly by taking a linear 
combination of them with real coefficients, setting it equal to the zero 2 x 2 
matrix, and showing that all coefficients are zero. 

Pauli spin matrices are matrices! And Chap. 30 showed us that matrices 
have a very rich Lie group and Lie algebra structures. It is therefore natural 
to see if there is any relation between the Clifford algebra CiR) and a 
Lie algebra, and whether this relation can be extended to the associated Lie 
group and Clifford group. 

First note that from the Clifford product oj0; + ojo; = 26;;1, we obtain 
the Lie product 


01 @ 02 = [0], 02] = 0102 — 0201 = 20102 = 203, 
plus cyclic permutation of the indices, which can be concisely written as 
[oj, ox] = 2i€ jKIO1. 
Now define the matrices s; = —io;/2 and note that 
[s;, Sx] = €jxiSy- (31.11) 


We thus have two kinds of connection. On the one hand, the three s; be- 
ing traceless and anti-hermitian, are connected to the Lie algebra su(2) of 
Box 29.1.20. Therefore, the Lie group obtained by exponentiating the s; is 
connected to the Lie group SU(2). On the other hand, Problem 31.6 tells 
us that Spin(3, 0) = SU(2). It therefore appears that the Clifford algebra 
c (R) is related to the group Spin(3, 0) in the same way that the Lie alge- 
bra su(2) is related to the Lie group SU(2). The relation is further confirmed 
by the commutation relations between the components of angular momen- 
tum, which are the generators of the rotation group in three dimensions, i.e., 
SO(3). The commutation relations are given in Example 29.1.35, and they 
are identical to those in Eq. (31.11). Hence, there seems to be the following 
chain of isomorphisms: 


Spin(3, 0) = SO(3) = SU(2). 


2In physics literature, the matrices are actually labeled as 03, 01, 02, respectively. 
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The first link? simply confirms Theorem 31.2.6. 

As mentioned earlier, the electron, due to its spin, is represented be a 
2-column with complex entries. These are vectors on which Pauli spin ma- 
trices act. How is this translated into the Clifford algebra Cc (R)? Recall 
from Theorem 3.3.1 that the minimal left ideals of the total matrix algebra 
are matrices with only one column nonzero, and that they are generated by 
a matrix with only one nonzero entry in that column. Thus, the first step is 
to identify the 2-column as the first column of a 2 x 2 matrix:+ 


ie (“1 ;) 

2 v2 O}° 

Let S stand for the minimal left ideal generated by P, a 2 x 2 matrix with a 
1 at the first position and zeros everywhere else: 


$= M2(C) ¢ 5) = Ch(R)P. 


Using the common convention of labeling the Pauli spin matrices and iden- 
tifying C9(R) with M2(C), we write 


i 0 1 = 0 -i Po 1 0O 
ey =o, = 1 oO}? e=02= F 0}: eB=03= oO =i)" 


Then 
1 2 
P= a +e3) and P*=P, 
i.e., P is a (primitive) idempotent of Cc (R). 
A basis of 5 can be found by left multiplying P by all the basis vectors 


of Cl). The reader can easily show that we obtain the following four 
matrices: 


1 1 0 1 0 0 
P= sa+e)=(, a) eiP = 3(e1 +e1e) = (j ae 
1 i 0 Mis, 0 0 
@P= sate) = (5 ). fe\P = Sie + ieres) = ({ 0 
(31.12) 


By Theorem 27.3.2, all Clifford algebras are either simple or the direct 
sum of two identical simple algebras. By Theorem 3.5.27 and Proposition 
3.5.22, a division algebra of a Clifford algebra can be obtained by right and 
left multiplying the algebra by a primitive idempotent. Since P is such an 


3It is not exactly an isomorphism, but a homomorphism which is locally an isomorphism, 
but not globally. Because of the double-valuedness of the kernel of the homomorphism 
of Theorem 31.2.6, one can think of Spin(3, 0) as two identical copies of SO(3). 


“Here we are ignoring the fact that the entries of the 2-column are functions rather than 
numbers. The full treatment of spinors whose entries are functions requires the tensor 
analysis of Clifford algebras, the so-called spin bundles, a topic which is beyond the 
scope of this book. 
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idempotent, we can extract the division algebra of Cc (R) by right and left 
multiplying it by P: 


DS = PCI(R)P. 


Since we have already right multiplied Cl) by P, to get on we need to 
multiply the vectors in Eq. (31.12) by P on the left. It then follows that 


D3 =(aP + Bere) =| (“4 0) [a BER} =C. 


which is the obvious statement that the (only) division algebra of )2(C) 
is C. 

A general division algebra is only one step away from being a field: it 
has to be commutative as well. In the majority of applications in physics, 
R and C are the only two division algebras of interest. And since they are 
both commutative, the ordering of scalar multiplication of an algebra A is 
irrelevant: 


aa=aa foracA,aeF 


where F is either R or C. To be as general as possible, we relax this condition 
of commutativity and distinguish between right and left multiplication by 
the elements of the division algebra. 

Now note that with $8 = c (R)P and D= a = PCS(R)P, the natural or- 
der of multiplication of S by an element of D is right multiplication because 


n 


~ (C§(R)P)(PC3(R)P) = (C§(RP)(C3(R)P) = 88 C8. 


The left multiplication of a left ideal of an algebra by its division algebras, 
in general, does not give back the left ideal. In our case here it does because 
the division algebra is just C, which is commutative. 

With a multiplication by a division algebra, 5 becomes a linear structure— 
a vector space on ID. We can thus construct the representation p‘°), the 
regular representation of Cc (R) in S as given in Definition 4.5.8 and The- 
orem 4.5.9. We want to find a matrix representation of cy (R) in S. To do 
so, we need to pick a basis of 5 and represent elements of Cc (R) in that 
basis. As a complex vector space (recall that D = C) S is two-dimensional. 
Take {P;, P2} = {P, e;P} as the basis of 8. To find the representation of an 
arbitrary element of C (R), it is sufficient to find matrices representing the 
generators of Cc (R). Using the basis in Eq. (31.10) and identifying e123 as 
i (because e3 = —1), we obtain 


e,P; =eyP =P. =0-P;+1-Po, 
e,;P> = e;e;P = P; = 1-P; + 0- Po, 


from which we deduce that the first basis vector is represented by the matrix 
(? })- Similarly, 


eoP; = e2P = iP2 =0-P; +7-Po, 


eoP2 = ene; P = —iP; = —7 -P; +0-Po, 
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and 


e3P; =e3P=P=1-P; +0-Po, 
e3P> = e3e;P = —P) =0-P; — 1-P> 


-i 


4 ) and CG oe ), respectively. We thus recover the 


give rise to matrices (° 
Pauli spin matrices. 


31.2.2 Spinors for C, (R) 


All the foregoing observation regarding ren (R) has its generalization to the 
Clifford algebras Cl (R). Central to the discussion was the existence of a 
primitive idempotent, because it gave rise to a minimal left ideal and the 
division algebra (basically the scalars), from which one could find the ma- 
trices representing the basis vectors of the algebra, and from those the matrix 
representation of the entire algebra. 

The case of Cl), although illustrative, is not general enough. As will 
become apparent, this algebra, to within equivalence of some sort, has only 
one primitive idempotent, which we could guess from our knowledge of 
2 x 2 matrices. In general, the task of finding the primitive idempotent—by 
“inspection” —is not easy. Fortunately, there is a procedure which routinely 
determines a primitive idempotent for any Clifford algebras Cy, (R). 

The idea is to start with one of the multi-vectors in the standard basis 


{1,ei.04, Lin <-0 <i, ee (31.13) 


of Ci (R), label it ex and note that 


pt=ta+ex), and Pp=ta-—ex) 
K 2 K); K 2 K 


are two mutually orthogonal idempotents of Ci, (R) which sum up to 1. 
Therefore, the algebra can be reduced to the (vector) direct sum of two sub- 
algebras: 


Cv (R) = C1 (RP; Oy CL(R)PZ = Ct RB) Oy CZ, @). 


Note that both the idempotency and mutual orthogonality of Py are impor- 
tant in making sure that the above is indeed a vector direct sum. Here is why: 
ifxe Ci) a) C,,,)(R). Then it can be written as x = aP;. = bP,. Thus, 
multiplying both sides of the last equality on the right by P; and using both 
properties of PZ, we get 


aPiP; =b(P;) = O=bP; =x. 
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Next we want to break up the Cie (R). To do so, we choose a new multi- 


vector ey and construct Py, as before. Multiplying these on the left by 


C7) (R), we obtain 


C),(R) = Ch, (RPP, Ov Cy (R)PKPH, Ov C),(R)PEP yy, 
®v Ch (RPP jy. 


As before, we need to make sure that all the double idempotents are indeed 
idempotents and mutually orthogonal. From 


and 


(Px Pi) =Px PPP iy =Px Pir 


we conclude that Pi, should commute with both Py and P;,. This commu- 
tation is achieved by demanding that exey = eyex, which also guarantees 
the commutativity of the remaining idempotents as well as the orthogonality 
of all four double idempotents. 

Each time we multiply the components—which are, by the way, left 
ideals of Ci, (IR)—of the direct vector sum by a new idempotent, we de- 
crease the dimension by a factor of 2. It is therefore clear that eventually 
we will obtain a minimal ideal. The question is when should we stop? To 
answer this question, we first need 


Definition 31.2.8 The Radon-Hurwitz number r; for i € Z is given by 
rj=i fori=0O, 1,2; r3 = 2; 
r7=3 fori=4,5,6,7; 
ri4g=rjt+4. 


Now for the theorem which gives a primitive idempotent for C7, (IR), and 
which we state without proof: 


Theorem 31.2.9 In the standard basis (31.13) of C7, (R) there are kn = 
V—Ty—p elements ey;, which commute with one another and square to 1. 


They generate on idempotents which add up to 1, and the product, 


v 
ky 


1 
f=[[5a +eu,), 


i=1 
is primitive in Ci (R). 


We now see why the construction of the spinors and Pauli spin matrices 
was so straightforward: 


kf =0-r_3=—(r5 — 4) = 1; 


31.2  Spinors 


and why the four idempotents of Eq. (27.57) worked in the construction of 
the Majorana representation of the Dirac matrices: 


kb =1—n3 =1—-r2.=1-(6—-4 =1-(-1) =2, 
and the number of idempotents is 27 = 4. 


Definition 31.2.10 The minimal left ideal 57, = C;, (R)f, where f is as in 
Theorem 31.2.9, together with the right multiplication by the division alge- 
bra D =f Ci (R)f (multiplication by a scalar), becomes a D-linear structure 
called the spinor space of C7, (R). 


As indicated before, for all Cy, (R), the division algebra D is R, C, or H. 
Since almost all cases of physical interest deal with R or C, scalar multipli- 
cation can be on the left or right. With 57, a minimal left ideal of C;,(R), by 
Theorem 4.5.9 we have the following: 


Definition 31.2.11 The irreducible representation p : Ci, (R) > Endp (S;) 
given by 


p(a)|s)=as, aeC)(R), |s)=seS), 


is called the spin representation of Ci. (R). If uw—v4~1 mod 4, C7, (R) is 
simple and therefore the spin representation is faithful. 


The procedure for finding the spin representations becomes fairly 
straightforward. First find the commuting multi-vectors. The easiest way 
is to pick one of the basis vectors of Rj}, say e; with i one of the jz indices. 
Once this is picked, the rest of the choices have to be even multi-vectors not 
including 7 in their indices so that e; commutes with them. So, next pick a 
bivector, say e;, with j being one of the v indices (why?) and k one of the 
lL ones not equal to i. The next pick has to be even and not include i, j, ork. 
Continuing in this fashion, all the k/, idempotents can be built. 

After finding the idempotents, multiply them to get f. Then multiply f on 
the left by all the basis vectors, and choose a set of linearly independent vec- 
tors from among the results to form a basis for S/,. Left-multiply the basis 


vectors of Si by f to find the division algebra D of the representation. This 


usually results either in a one-dimensional algebra, which can be identified 
as R, or in a two-dimensional algebra one of whose basis vectors squares 
to —1, in which case the algebra can be identified as C. 

Now multiply the basis of 57, on the left by all the basis vectors of C7, (R). 
The result will be a D-linear combination of the basis vectors of 37, the 
coefficients of which form the columns of the matrix representation of the 
basis vectors of Ci, (R). The important case of ce (R) illustrates all of these 
points. 


spinor space 


spin representation 
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31.2.3 C}(R) Revisited 


As noted earlier, i = 2. So, we pick e; and ep2 = ege2 = €0 V e2 as the 
two commuting vectors out of which we can construct our idempotents.° It 
follows that 


1 
f= 4" + e1)(1 + eo2). 
Next, we apply all the basis vectors of Cc} (R) to f to generate a basis 
for 8}. We quote the results and ask the reader to verify them: 


1-f=f, eo -f=fo, e, -f=f, eo -f = —fo, 
e3 :f=fs, eo, -f = fo, ego -f=f, e093 -f = fa, 

e12 -f=fo, e3 -f = —fs, e03 -f = —fa, eo12 -f=—-f, 
e013 -f = —fa, e023 - f = fs, e123 -f = —fa, e0123 - f = fs. 


There are four different elements, f, fo, f3, and f4. Different, of course 
doesn’t mean linearly independent, but by writing each out in terms of the 
basis vectors of C} (R), the reader can verify that the four elements are in- 
deed linearly independent. 

Now we find D by multiplying 8} on the left by f. This means multiplying 
the vectors in its basis on the left and collecting all the linearly independent 
vectors we obtain. We illustrate one such calculation and let the reader show 
that we get zeros except when we multiply f by itself. As an illustration we 
show that f - f4 = 0: 


1 
f -f4 = fepesf = 76"! + e;)(1 + eg2)ege3(1 + e1)(1 + eo2) 


1 
= 7eeot e1)(1 — eo2)(1 — €1)(1 + €02)e3 


1 
Aue e1) (1 — e92)(1 + eo) €3 =O. 
—<$<— —_——" 
=0 


Thus, D is one-dimensional, i.e., it is isomorphic to R. Therefore, 8} pro- 
vides a real representation of Cc (R). 

Our next task is to find the matrices representing the generators of Cc (R), 
namely eo, €1, €2, and e3. We show how to construct the zeroth matrix, leav- 
ing the rest of the matrices to the reader. To find the matrix corresponding to 
€o, multiply the ith basis vector of 8} by eo, write it as a linear combination 
of the basis vectors, and note that the coefficients form the ith column of the 
desired matrix. Labeling f, fo, f3, and f4 as 1, 2, 3, and 4, we obtain 


eof = fp =0-f+1-fp9 +0-f3 +0- fy, 


5Here, we are again using the physicists’ convention of setting eg = eo, with es =—1. 
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yielding (0, 1, 0, 0, ) as the entries of the first column. Multiplying the other 
basis vectors, we get 


eofo = eof = —f = —1-f+0- fo +0-f3+0- fa, 
eof; = eoesf = f4 = 0-f+0- fo +0-f3 +1- fa, 


eof, = eneoesf = —e3f = —f3 = 0-f+0- fo —1-f3+0- fy. 


Collecting the columns and denoting the matrix by Eo, we get 


0 
1 
Ej = 0 
0 


The matrices representing the other basis vectors can be found similarly. We 
collect all the four matrices in the following equation: 


<1 0 6 10 00 
io % @ 0-100 
Fo= lo 0 0-1]? EI=10 0 -10]° 
0 0 1 0 00 01 
(31.14) 
oof & 6 0010 
1 © @ w 000-1 
B=!1o 0 0 -1) B=liooa 
6 <1 4 0-10 0 


The reader may check that these matrices satisfy the Clifford algebra rule, 
E,E, + EvEn = 2nuv1, 


as they should. 

These matrices are one representation of c (R). Section 27.4.3 gave us 
another. Are these two representations equivalent? If they are, then by Def- 
inition 4.5.6, there should be an invertible linear transformation connecting 
the two. So, we try to find an invertible matrix A such that Ay Ag! =E,, 
or Ay, = E,A. Assume a general A and choose its elements so that these 
matrix equalities hold. For example, when jz = 0, we have, on the one had 


a0 1 %2 a3\ /0 O0 DO —1 
Aj 10 O11 2 a13)70 0 1 O 
a9 «21 22 3 J JO —-1 O O 
a30 631 «032 033/ \l1 O O O 


03 —2 01 —00 
13° —@2 Qi; —a)10 
23 —22 @21 —a20 
033 —Q32 O31 —30 
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and on the other hand 


0 -1 0 O a00 «G01 02 a3 
EpA= 1 0 0 0O O10 O11 12 a3 
0 0 0 -!I 29 G21 A22 A23 
0 0 1 O 039 «O31 032 133 


For these two matrices to be equal, we must have 


A190 = —03, 11 = 02, 12 = —a01, 13 = a0, 
(31.15) 
030 = —23, 031 = 22, 32 = —21, 0133 = 20. 


These conditions eliminate some of the elements of A, so that it now be- 
comes 


00 a01 a02 03 

A= —a03, 02. —A01 = 00 
20 O21 22 23 

—23, 22, 21-20 


The straightforward calculation of determining the rest of the elements of A 
is left to the reader. We simply quote the result: 


ooroe 
g 
S 
So 


31.3 Problems 


31.1 (a) Use Eq. (31.1) to prove (31.2). 
(b) Use the invertible vector y with g(y, y) 4 0 in Eq. (27.14) to show that 


y=aty.yy |. 


(c) Multiply both sides of Eq. (27.14) written for x and y by y~! and use 
(b) to show that ad(y)x = Ryx. 


31.2 Prove Corollary 31.1.8. Hint: Show that wy (a) = wy (a). 
31.3 Prove the first and third relations of Corollary 31.1.10. 


31.4 Provide the details of the proof of Corollary 31.1.15. 


31.3. Problems 1007 


31.5 Consider the Clifford algebra C(2, 0). Let a= ag1 + aye; + ae. + 
312 be an element of this algebra. 


(a) With b = £o1+ Be; + Boe2 + f3e12, seta V b = 1, find 6; in terms 
of a;s to show that 


a= aol — ae) — a2e2 — a3e12 


a _ at _ as + ae 
(b) Show that for Eq. (31.1) to hold, we must have 
aja; +a203=0 and agar —aja3=0. 


Therefore, (2, 0) = {a ,e; + a2e€2, 291 + a3e12}. 
(c) Now show that 


Spin(2, 0) = {z¢C | |z|=1} SU) = {e’ |6 ER}. 
31.6 Using the procedure of Problem 31.5 show that 
Spin(3, 0) = {q €H|lq|= 1} = SU(2). 
Warning: You may have to use a computer algebra software! 


31.7 Show that {1, 01, 02, 03,11, 101, i027, i03}, with o; given by Eq. (31.9), 
are eight linearly independent vectors of a real vector space. 


31.8 Find ke for 4 =1,...,8. For 4. =6, 7, 8 find the corresponding max- 
imum number of idempotents. 


31.9 Show that the maximal number of idempotents k,, satisfies the period- 
icity condition 
ki agHhk +4. 


31.10 Go through the same calculation as done in the book for Eo to find 
the other matrices of Eq. (31.14). 


31.11 Choose two different multi-vectors for CG (R); find the correspond- 
ing f, 8}, D, and the matrices representing the basis vectors eo, e;, €2, and e3. 
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Lie groups and Lie algebras, because of their manifold—and therefore, dif- 
ferentiability—structure, find very natural applications in areas of physics 
and mathematics in which symmetry and differentiability play important 
roles. Lie himself started the subject by analyzing the symmetry of differen- 
tial equations in the hope that a systematic method of solving them could be 
discovered. Later, Emmy Noether applied the same idea to variational prob- 
lems involving symmetries and obtained one of the most beautiful pieces 
of mathematical physics: the relation between symmetries and conservation 
laws. More recently, generalizing the gauge invariance of electromagnetism, 
Yang and Mills have considered nonabelian gauge theories in which gauge 
invariance is governed by a nonabelian Lie group. Such theories have been 
successfully built for three of the four fundamental interactions: electromag- 
netism, weak nuclear, and strong nuclear. Furthermore, it has been possible 
to cast the fourth interaction, gravity—as described by Einstein’s general 
theory of relativity—in a language very similar to the other three interac- 
tions with the promise of unifying all four interactions into a single force. 
This chapter is devoted to a treatment of the first topic, application of Lie 
groups to DEs. The second topic, the calculus of variations and conserva- 
tion laws, will be discussed in the next chapter. The third topic, that of gauge 
theories, is treated under the more general setting of fiber bundles in the last 
part of the book. 


32.1 Symmetries of Algebraic Equations 


The symmetry group of a system of DEs is a transformation group that acts 
on both the independent and dependent variables and transforms solutions 
of the system to other solutions. In order to understand this symmetry group, 
we shall first tackle the simpler question of the symmetries of a system of 
algebraic equations. 


Definition 32.1.1 Let G be a local Lie group of transformations acting on 
a manifold M. A subset S C M is called G-invariant and G is called a 
symmetry group of S if whenever g- P is defined for P € § and geG, 
then g-PeS. 
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Example 32.1.2 Let M =R?. 


(a) Let G=R* be the abelian multiplicative group of real numbers. Let 
it acton M componentwise: r - (x, y) = (rx, ry). Then any line going 
through the origin is a G-invariant subset of R?. 

(b) IfG=SO(2) and it acts on M as usual, then any circle is a G-invariant 
subset of R?. 


A system of algebraic equations is a system of equations 
Fy(x)=0, v=1,2,...,n, 


in which F, : M —> R is smooth. A solution is a point x € M such that 
F,(x) =0 for v = 1,...,n. The solution set of the system is the collection 
of all solutions. A Lie group G is called a symmetry group of the system 
if the solution set is G-invariant. 


Definition 32.1.3 Let G be a local Lie group of transformations acting on 
a manifold M. A map F : M — N, where N is another manifold, is called a 
G-invariant map if for all P € M and all g € G such that g - P is defined, 
F(g- P) = F(P). A real-valued G-invariant function is called simply an 
invariant. 


The crucial property of Lie group theory is that locally the group and its 
algebra “look alike”. This allows the complicated nonlinear conditions of 
invariance of subsets and functions to be replaced by the simpler linear con- 
ditions of invariance under infinitesimal actions. From Definition 29.1.30, 
we obtain the following proposition. 


Proposition 32.1.4 Let G be a local group of transformations acting on a 
manifold M.A smooth real-valued function f : M — R is G-invariant if 
and only if 


Evyle(f)=0 forall PEM (32.1) 
and for every infinitesimal generator & € g. 
Example 32.1.5 The infinitesimal generator for SO(2) is & y = xdy — ydy. 


Any function of the form f (x? + y?) is an SO(2)-invariant. To see this, we 
apply Proposition 32.1.4: 


(xdy — yx) f (x7 + y*) =xQy) f’ — yx) f’ =0, 


where f’ is the derivative of f. 


The criterion for the invariance of the solution set of a system of equa- 
tions is a little more complicated, because now we are not dealing with func- 
tions themselves, but with their solutions. The following theorem gives such 
a criterion (for a proof, see [Olve 86, pp. 82—83]): 
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Theorem 32.1.6 Let G be a local Lie group of transformations acting on 
an m-dimensional manifold M. Let F : M > R",n <™m, define a system 
of algebraic equations {F,(x) = O}'_,, and assume that the Jacobian ma- 
trix (0F,/dx*) is of rank n at every solution x of the system. Then G is a 
symmetry group of the system if and only if 


€Eylx(Fv) =0 Vv whenever F(x) =0 (32.2) 
for every infinitesimal generator & € g. 
Note that Eq. (32.2) is required to hold only for solutions x of the system. 


Example 32.1.7 Let M = R* and G = SO(2). Consider F : M > R de- 
fined by F(x, y) = x* + y? — 1. The Jacobian matrix is simply the gradient, 


(OF /dx, OF /ay) = (2x, 2y), 


and is of rank 1 for all points of the solution set, because it never vanishes 
at the points where F(x, y) = 0, i.e., the unit circle. It follows from Theo- 
rem 32.1.6 that G is a symmetry group of the equation F(x, y) = 0 if and 
only if & y|+(F) = 0 whenever r € S!. But 


Eylr(P) = (x0y — yoy) Fly = 2xy — 2yx =0. 


This is a proof of the obvious fact that SO(2) takes points of S! to other 
points of S!. 
As a less trivial example, consider the function F : R* > R given by 


F(x, y)=x7y* + yt + 2x7 + y* —2. 
The infinitesimal action of the group yields 
Ey (FP) = (xdy — yoy) F = 2x7 y + 2xy? —2xy = Qxy (x? ie 1). 


The reader may check that & ,,(F’) = 0 whenever F(x, y) = 0. The Jacobian 
matrix of the “system” of equations is the gradient 


VF= (2xy? +4x, 2x7y +4y3 + 2y), 


which vanishes only when x = 0 = y, which does not belong to the solution 
set. Therefore, the rank of the Jacobian matrix is 1. We conclude that the 
solution set of F(x, y) = 0 is a rotationally invariant subset of R*. Indeed, 
we have 


F(x, y)=x°y* + y* +2x* + y* —2= (y? +2) (x* + y* -1), 


and the solution set is just the unit circle. Note that although the solution set 
of F(x, y) = 0 is G-invariant, the function itself is not. 
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We now discuss how to find invariants of a given group action. Start with 
a one-parameter group and write 


, 0 

i =X : aa 

Eu ax! 

for the infinitesimal generator of the group in some local coordinates. A lo- 

cal invariant F(x) of the group is a solution of the linear, homogeneous 
PDE 


1. OF n/\ OF 
v(F) = X!(x)— +--+ + X"(x) — =0. (32.3) 
ax! ax" 


It follows that the gradient of F is perpendicular to the vector v. Since the 
gradient of F is the normal to the hypersurface of constant F, we may con- 
sider the solution of Eq. (32.3) as a surface F(x) = c whose normal is per- 
pendicular to v. Each normal determines one hypersurface, and since there 
are n— | linearly independent vectors perpendicular to v, there must be n — 1 
different hypersurfaces that solve (32.3). Let us write these hypersurfaces as 


Piatt See GSU 2 ucat— 1, (32.4) 


and note that 
A . 
; OF/ . 
ieee oo - 
AF prs =0, j=1,2,...,.n—-1. 
i= 


A solution to this equation is suggested by (32.3): 


Ax! 
xi 


Ax'=axX' 3 =a. 


For Ax! — dx', we obtain the following set of ODEs, called the character- 
istic system of the original PDE, 


dx! = dx? a _ dx" 
X!(x) X2(x) Xx)’ 


(32.5) 


whose solutions determine { F/ (@) ai. To find these solutions, 


Box 32.1.8 Take the equalities of (32.5) one at a time, solve the first 
order DE, write the solution in the form of (32.4), and read off the 
functions. 


The reader may check that any function of the F/’s is also a solution of 
the PDE. In fact, it can be shown that any solution of the PDE is a function 
of these F/’s (see [Olve 86, pp. 86-90). 


Example 32.1.9 Once again, let us consider SO(2), whose infinitesimal 
generator is — yd, + xdy. The characteristic “system” of equations is 
dx dy 


=— => xdx+ydy=0 > x+y =e. 
—y  * 
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Thus, F(x, y) = x* + y’, or any function thereof, is an invariant of the ro- 
tation group in two dimensions. 
As a less trivial example, consider the vector field 


0 0 0 
ae eae + Va? ze 
y Zz 


Ox 


where a is a constant. The characteristic system of ODEs is 


The first equation was solved above, giving the invariant F)(x, y,z) = 
Vx2+ y? =r. To find the other invariant, solve for x and substitute in the 
second equation to obtain 


dy dz 
Vr-y Va= Ze 


The solution to this DE is 


ne WY. _ fs re) in & 
arcsin— =arcsin—+C => arcsin— —arcsin—=C. 
r a r a 


—SS— 
7 B 


Hence, F(x, y, Z) = arcsin(y/r) — arcsin(z/a) is a second invariant. By 
taking the sine of F2, we can come up with an invariant that is algebraic 
(rather than trigonometric) in form: 


s = sin F) = sin(a — 6) = sina cos B — cosa sin B 


-% 22 ji yz ywat— 24 — xz 
= 5 = . 


r-a ra 


Any function of r and s is also an invariant. 


When the dimension of the Lie group is larger than one, the computation 
of the invariants can be very complicated. If {v,},_, form a basis for the 
infinitesimal generators, then the invariants are the joint solutions of the 
system of first order PDEs 


ea Oe 
ve(F) = XS 5, k=1,...,r. 
Je) 


To find such a solution, one solves the first equation and finds all its invari- 
ants. Since any function of these invariants is also an invariant, it is natural 
to express F as a function of the invariants of v;. One then writes the re- 
maining equations in terms of these new variables and proceeds inductively. 


Example 32.1.10 Consider the vector fields 


u=—yod; +Xdy, 
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_ x?z¢+xy*z+a>x x*yz+yztary 


= dy + dy 
( Vx? 4+ y?2 ) : ( Vx2+ y? ) , 
— (Vx? + y2z? + b?)a, 


where a and b are constants. The invariants of u are functions of r = 
Vx2+ y2 and z. If we are to have a nontrivial solution, the invariant of 
v as well as its PDE should be expressible in terms of r = \/x? + y? and z. 
The reader may verify that 


v(F) = (22403)o— — (rz* +b°)— =0 


with the characteristic equation 


dr dz 


rz+a3 rz? + b> 


This is an exact first-order DE whose solutions are given by 
1 Qe 3 3 
a. Ztaztbr=c 
with c an arbitrary constant. Therefore, F = are? +a>z+b°r, or 


1 
F(x, y,2= s(t ty) taez+ bx? + y?, 


or a function thereof, is the single invariant of this group. 


32.2 Symmetry Groups of Differential Equations 


Let S be a system of partial differential equations involving p independent 
variables x = (x!,...,x?), and g dependent variables u = (u!,...,u%). 
The solutions of the system are of the form u = f(x), or, in component 
form, u” = f%(x!,...,x?),a=1,...,q. Let X =R? and U = R¢ be the 
spaces of independent and dependent variables with coordinates {x'} and 
{u"}, respectively. Roughly speaking, a symmetry group of the system S 
will be a local group of transformations that map solutions of S into solu- 
tions of S. 


Historical Notes 

Marius Sophus Lie (1842-1899) was the youngest son of a Lutheran pastor in Norway. 
He studied mathematics and science at Christiania (which became Kristiania, then Oslo 
in 1925) University where he attended Sylow’s lectures on group theory. There followed 
a few years when he could not decide what career to follow. He tutored privately after his 
graduation and even dabbled a bit in astronomy and mechanics. 

A turning point came in 1868 when he read papers on geometry by Poncelet and Plucker 
from which originated the inspiration in the topic of creating geometries by using el- 
ements other than points in space, and provided the seed for the rest of Lie’s career, 
prompting him to call himself a student of Pliicker, even though the two had never met. 
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Lie’s first publication won him a scholarship to work in Berlin, where he met Klein, who 
had also been influenced by Pliicker’s papers. The two had quite different styles—Lie 
always pursuing the broadest generalization, while Klein could become absorbed in a 
charming special case—but collaborated effectively for many years. However, in 1892 
the lifelong friendship between Lie and Klein broke down, and the following year Lie 
publicly attacked Klein, saying, “I am no pupil of Klein, nor is the opposite the case, 
although this might be closer to the truth.” Lie and Klein spent a summer in Paris, then 
parted for some time before resuming their collaboration in Germany. While in Paris, 
Lie discovered the contact transformation, which, for instance, maps lines into spheres. 
During the Franco-Prussian War, Lie decided to hike to Italy. On the way, however, he was 
arrested as a German spy and his mathematics notes were assumed to be coded messages. 
Only after the intervention of Darboux was Lie released, and he decided to return to 
Christiania. In 1871 Lie became an assistant at Christiania and obtained his doctorate. 1842-1899 
After a short stay in Germany, he again returned to Christiania University, where a chair 

of mathematics was created for him. Several years later Lie succeeded Klein at Leipzig, 

where he was stricken with a condition, then called neurasthenia, resulting in fatigue and 

memory loss and once thought to result from exhaustion of the nervous system. Although 

treatment in a mental hospital nominally restored his health, the once robust and happy 

Lie became ill-tempered and suspicious, despite the recognition he received for his work. 

To lure him back to Norway, his friends at Christiania created another special chair for 

him, and Lie returned in the fall of 1898. He died of anemia a few months later. Lie had 

started examining partial differential equations, hoping that he could find a theory that 

was analogous to Galois’s theory of equations. He examined his contact transformations 

considering how they affected a process due to Jacobi of generating further solutions 

from a given one. This led to combining the transformations in a way that Lie called a 

group, but is today called a Lie algebra. At this point he left his original intention of ex- 

amining partial differential equations and examined Lie algebras. Killing was to examine 

Lie algebras quite independently of Lie, and Cartan was to publish the classification of 

semisimple Lie algebras in 1900. Much of the work on transformation groups for which 

Lie is best known was collected with the aid of a postdoctoral student sent to Christiania 

by Klein in 1884. The student, F. Engel, remained nine months with Lie and was instru- 

mental in the production of the three volume work Theorie der Transformations gruppen, 

which appeared between 1888 and 1893. A similar effort to collect Lie’s work in con- 

tact transformations and partial differential equations was sidetracked as Lie’s coworker, 

F. Hausdorff, pursued other topics. 

The transformation groups now known as Lie groups provided a very fertile area for 

research for decades to come, although perhaps not at first. When Killing tried to classify 

the simple Lie groups, Lie considered his efforts so poor that he admonished one of his 

departing students with these words: “Farewell, and if ever you meet that s.o.b., kill him.” 

Lie’s work was continued (somewhat in isolation) by Cartan, but it was the papers of 

Weyl in the early 1920s that sparked the renewal of strong interest in Lie groups. Much 

of the foundation of the quantum theory of fundamental processes is built on Lie groups. 

In 1939, Wigner showed that application of Lie algebras to the Lorentz transformation 

required that all particles have the intrinsic properties of mass and spin. 


Marius Sophus Lie 


To make precise the above statement, we have to clarify the meaning 
of the action of G on a function u = f(x). We start with identifying the 
function f (i.e., a map) with its graph (see Chap. 1), 


transform the graph of a 
function to find the 
function’s transform! 


Dy ={(x, f@)) |xe2}cXxU, 


where 2 C X is the domain of definition of f. If the action of g € G on’ ¢ 
is defined, then the transform of I" by g is 


g-Ty={@,@ =g8-(,u)|@,w) ely}. 


In general, g - I’ ¢ may not represent the graph of a function—in fact, it may transform of a function 
not be even a function at all. However, by choosing g close to the identity by a group element 
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of G and shrinking the size of 82, we can ensure that g-T' = Te, i.e., that 
g- Ty is indeed the graph of a function # = f (&). We write f =g- f and 


call f the transform of f by g. 


Example 32.2.1 Let X = R =U, so that we are dealing with an ODE. Let 
G = SO(2) be the rotation group acting on X x U = R®. The action is given 
by 


(x, u) = 6 - (x, u) = (xcosé — using, x sind + ucosé). (32.6) 


If u = f(x) is a function, the group SO(2) acts on its graph I’ by rotating 
it. This process can lead to a rotated graph @ - I'¢, which may not be the 
graph of a single-valued function. However, if we restrict the interval of 
definition of f, and make @ small enough, then 6 - I’ will be the graph of a 


well-defined function u = f(x) with I j= 0 - Ty. If we substitute f(x) for 
u, we obtain 


(x,u) =6- (20; f (x) = (x cos 6 — f(x)sind, x sind + f(x) cosé), 
or 


x =xcosé — f(x)sing, 
(32.7) 
u=xsind + f(x)cosé. 


Eliminating x from these two equations yields u in terms of x, from which 
the function 7 can be deduced. 

As a specific example, consider f(x) = kx. Then, the first equation of 
(32.7) gives 


__ cos@ — cos? 6 — 4kx sind 


ksin@)x? — cos@x +%=0 = 
(k sin@)x* — cosO@x + x => x aEaEe 


where we kept the root of the quadratic equation that gives a finite answer 
in the limit 6 — 0. Inserting this in the second equation of (32.7) and sim- 
plifying yields 


cos6 — Vcos?9—4kxsind 


5 x coté. 
2k sin~ 0 


i= f (x)= 


We write this as 


_ cos 6 — /cos2 6 — 4kx sind 
fM= 06° f/\a)= 7") xcoté. 
2k sin~ 0 


This equation defines the function 7 =0-f. 


The foregoing example illustrates the general procedure for finding the 
transformed function f =g- f: 


32.2 Symmetry Groups of Differential Equations 


Box 32.2.2 If the rule of transformation of g € G is given by 
Cow 2 (1,6) = (Ve), ©, 0) 


then the graph T hae I of g - f is given parametrically by 


F=V,(x,f@)), w=,(x, f(r). (32.8) 


In principle, we can solve the first equation for x in terms of x and sub- 
stitute in the second equation to find uw in terms x, and consequently f : 

For some special but important cases, the transformed functions can be 
obtained explicitly. If G is projectable, i.e., if the action of G on x does 
not depend on u, then Eq. (32.8) takes the special form x = W(x) and u = 
®,(x, f(x)) in which W, is a diffeomorphism of X with inverse W-1. If 
I; is the graph of a function /, then its transform g - T° is always the graph 
of some function. In fact, 


fi = f(%) = Be (x, f(x) = Oe (Hy 1%), (M41). 2.9) 
In particular, if G transforms only the independent variables, then 
i=fB=fW=f(%a@®) > f=foW%a. 210) 


For example, if G is the group of translations x +> x +a, then the transform 
of f will be defined by f(x) = f(x —a). 


Definition 32.2.3 A symmetry group of a system of DEs § is a local group 
of transformations G acting on an open subset M of X x U with the property 
that whenever u = f (x) is a solution of § and f = g- f is defined for g € G, 
then u = f(x) is also a solution of 8. 


The importance of knowing the symmetry group of a system of DEs lies 
in the property that from one solution we may be able to obtain a family 
of other solutions by applying the group elements to the given solution. To 
find such symmetry groups, we have to be able to “prolong” the action of 
a group to derivatives of the dependent variables as well. This is obvious 
because to test a symmetry, we have to substitute not only the transformed 
function u = f (x), but also its derivatives in the DE to verify that it satisfies 
the DE. 


32.2.1 Prolongation of Functions 


Given a function f : R? > X — R, there are 


_ {pt+k-1 
Pr= i 
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Note that the “points” of 
J"(X x U) are U-valued 
functions! 
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different derivatives of order k of f. We use the multi-index notation 


ak f (x) 


A 7% f(%) = Ox/l Ox/2 oe Ox Jk 


for these derivatives, where J = (j1,---> Jk) € N is an unordered k- 
tuple of nonnegative integers with 1 < j, < p (see also Sect. 21.1).! The 
order of the multi-index J, denoted by |J, is the sum of its com- 
ponents and indicates the order of differentiation. So, in the derivative 
above, | J | = j, +-+++ jy =k. For a smooth map f : X > U, we have 
f@= (Fi), ..., f4(x)), so that we need q - pg numbers to represent all 
k-th order derivatives 0;« f%(x) of all components of f. We include the 
case of k = 0, in which case 0, f%(x) = f% (x). 

To geometrize the treatment of DEs (and thus facilitate the study of their 
invariance), we need to construct a space in which derivatives of all orders 
up to a certain number n participate. Since derivatives need functions to 
act on, we arrive at the space of functions whose derivatives share certain 
common properties. To be specific, me make the following definition. 


Definition 32.2.4 Let f and h be functions defined on a neighborhood of 
a point a € X with values in U. We say that f and h are n-equivalent at 
aif O7@ f* (a) = 0 ;@h* (a) for all w and k = 0, 1,...,. The collection of 
all U-valued functions defined on a neighborhood of a will be denoted by 
Tg(X x U), and all functions n-equivalent to f ata by j” f. 


A convenient representative of such equivalent functions is the Taylor 
polynomial of order 7 (the terms in the Taylor series up to nth order) about a. 
Now collect all j? f for all a and f, and denote the result by J”(X x U), 
so that 


J"(X x U)={ ji f |Wa eX and Vf eT g(X x U)}. (32.11) 


J” (X x U) is called the nth prolongation of U, or the nth jet space of U. 
It turns out that J”(X x U) is a manifold (see [Saun 89, pp. 98 and 199]). 


Theorem 32.2.5 J"”(X x U) is a manifold with natural coordinate func- 
tions (x', {usa Mav) =(x', uS) defined by 


ei h=aa, usw (icf) = 9yof*@, k=0,1,...,n. 


The natural coordinate functions allow us to identify the space of the 
derivatives with various powers of R. Let U; = R??* denote the set of co- 
ordinates USA)? and let U =U x U, x --+ x Uy be the Cartesian product 
space” whose coordinates represent all the derivatives Us ik) of all orders 


'We shall usually omit the superscript (k) in J when it is understood that all orders of 
J up to a certain given number are involved. 


2Note that U, identified with the space of zeroth derivative, is a factor in U @). 
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from 0 to n. The dimension of U“ is 
prn 
qtqpit--++4Pn =a( " ) =qp. 


A typical point in U is denoted by u™, which has gp different com- 
ponents {ua yoea , where J“ runs over all unordered multi-indices J“ = 
(i1,---, Jk) with 1 < jp, < p and 0 <k <n. The nth jet space J”(X x U) 
can now be identified with X x U™. From now on, we shall use X x U™ 


in place of J"(X x U). 


Example 32.2.6 Let p =3 and u = 1, ie., X = R* and U =R. The co- 
ordinates of X are (x, y, z) and that of U is u. The coordinates of Uj; are 
(ux, Uy, Uz), where the subscript denotes the variable of differentiation. Sim- 
ilarly, the coordinates of Uz are 


(xx, Uxy,Uxz,Uyy, Uyz, Uzz) 
and those of U?) =U x U; x U2 are 
(u; ux, Uy, Uz, Uxx, Uxys Uxz, Uyy, Uyz, Uzz), 


which shows that U®) is 10-dimensional. 


Definition 32.2.7 Given a smooth map f : X > 2 — U, we define a map 
pr” f : 2 > U™ whose components (pr f)% are given by 


(pr £4 x) = ay f(a) = ({8j@ f27@)}f_p).- 


This map is called the nth prolongation of f. 


Thus, for each x € X, pr” f(x) is a vector in R4 Whose components 
are the values of f and all its derivatives up to order n at the point x. For 
example, in the case of p = 3, g = 1 discussed above, pr f(x, y, z) has 
components 


fi Of Of Of af fe 20h Ooh Ory ey 
Ax’ dy’ Az’ Ax2” Axdy’ Axdz’ Ay?’ Aydz’ az? J 
When the underlying space is an open subset M of X x U, its corresponding 
jet space is 
M”=MxU, x--+x Un, 

which is a subspace of X x U™ = J"(X x U). If the graph of f : X > U 
lies in M, the graph of pr” f lies in M™. 

Prolongation allows us to turn a system of DEs into a system of algebraic 
equations: Given a system of / DEs 


Ay({x"}, {uu}, {8:07}, {8807}, -25 [84 ---0;,u7}) =0, v= ly.../, 


prolongation of a 
function 
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one can define a map A : M“”) —> R! and identify the system of DEs with 
SA= {(x, u”) eM” | Al, u™) a O}. 


By identifying the system of DEs with the subset 5, of the jet space, we 
have translated the abstract relations among the derivatives of u into a geo- 
metrical object 8a, which is more amenable to symmetry operations. 


Definition 32.2.8 Let 2 be a subset of X and f : 2 — U a smooth map. 
Then f is called a solution of the system of DEs S, if 


A(x, pr™ f(x))=0 Vxe®. 


Just as we identified a function with its graph, we can identify the solution 
of a system of DEs with the graph of its prolongation pr” f. This graph, 
which is denoted by ry’, will clearly be a subset of S,: 


re = {(x, pr f(x))} C Sa. 


Box 32.2.9 An nth order system of differential equations is taken to 
be a subset 8, of the jet space J"(X x U), and a solution to be a 
smooth map f : 82 > J"(X x U) the graph of whose nth prolonga- 
tion pr f is contained in 8,. 


Example 32.2.10 Consider Laplace’s equation 
Viva + Uyy +Uzz =0 


with p = 3, g = 1, and n = 2. The total jet space is the 13-dimensional 
Euclidean space X x U), whose coordinates are taken to be 


(X,Y, Z3 U3 Ux, Uy, Uz, Uxx, Uxy, Uxz, Uyy, Uyz, Uzz). 


In this 13-dimensional Euclidean space, Laplace’s equation defines a 12- 
dimensional subspace S, consisting of all points in the jet space whose 
eighth, eleventh, and thirteenth coordinates add up to zero. A solution f : 
R? > 2 > U CR mustsatisfy 


a a os a eA 
=0 V(x, y, ee 
ax? aye az? vate) = 


This is the same as requiring the graph re to lie in 8,. For example, if 


fr) = fx, y,2) =x yz— yoxzt y? — 3yz’, 


then, collecting each section of pr f with fixed a (see Definition 32.2.7) 
separately, we have 


(pr® f)! (r) = x3 yz — yPxz + y? — 3yz?, 
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2 
(pr® Ff) (r) = (3x7 yz — yz,22- B3y7xz + 3y" ae xy - yx _ 6yz), 
(pr? f)?(r) = (6xyz, 3x7z — 3y*z, 3x7y — y’, 
— 6xyz+ 6y, io = 3xy" — 62, —6y) 


which lies in 8, because the sum of the eighth, the eleventh, and the thir- 
teenth coordinates of (x, y, z; pr f (x, y, z)) is 6xyz—6xyz+6y—6y =0. 


32.2.2 Prolongation of Groups 


Suppose G is a group of transformations acting on M C X x U. It is possi- 

ble to prolong this action to the n-jet space M“. The resulting group that 

acts on M is called the nth prolongation of G and denoted by pr” G nth prolongation of a 
with group elements pr“ g, for g € G. This prolongation is defined natu- group action 

rally: The derivatives of a function f with respect to x are transformed into 

derivatives of 7 = g- f with respect to x. More precisely, 


pr’ - (ir f) = (# pr F@) = (% PrM(@ A@).| 2.12) 


For n = 0, Eq. (32.12) reduces to the action of G on M as given by 
Eq. (32.8). That the outcome of the action of the prolongation of G in the 
defining equation (32.12) is independent of the representative function f 
follows from the chain rule and the fact that only derivatives up to the nth 
order are involved. The following example illustrates this. 


Example 32.2.11 Let X, U, and G be as in Example 32.2.1. In this case, 
we have 
pre. (ir f) = prO@ - (x, u,u1) = Ge, i, i). 
We calculated x and uw in that example. They are 
xX =xcosé—usind, 


(32.13) 
u=xsind+ucosé. 


To find #1, we need to differentiate the second equation with respect to x 
and express the result in terms of the original variables. Thus 


7 du dudx ; pg 9 dx anes ae 
uy = — = —_ —_ = {| Sin — COS — = (sin uy COS == 
dk dx dx dx di . di 


dx /dx is obtained by differentiating the first equation of (32.13): 


dx du , dx dudx . ’ dx 
1 = —cos6é — — sind = —cosé@ — — — sind = (cos6 — u, sin) —, 
dx dx dx dx dx dx 
or 
dx 1 


dk cos@ —u; sind" 
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It therefore follows that 


_ sind + uy cos@ 


y= 
cos @ — uy, sind 


and 


ind é 
pri (usar) = (e080 —usin6, x sind + weos6, ue ae : 


cos@ — uy, sin@ 


We note that the RHS involves derivatives up to order one. Therefore, the 
transformation is independent of the representative function. So, if we had 
chosen jih where h is 1-equivalent to f, we would have obtained the same 
result. This holds for derivatives of all orders. Therefore, the prolongation 
of the action of the group G is well-defined. 


In many cases, it is convenient to choose the nth-order Taylor polynomial 
as the representative of the class of n-equivalent functions, and, if possible, 
write the transformed function f explicitly in terms of x, and differentiate 
it to obtain the transformed derivatives (see Problem 32.3). 

Example 32.2.11 illustrates an important property of the prolongation 
of G. We note that the first prolongation pr“ G acts on the original coordi- 
nates (x, u) in exactly the same way that G does. This holds in general: 


Box 32.2.12 The effect of the nth prolongation pr” G to derivatives 
up to order m <n is exactly the same as the effect of pr” G. If we al- 
ready know the action of the mth-order prolonged group pr“ G, then 
to compute pr G we need only find how the derivatives us, of order 
higher than m transform, because the lower-order action is already 
determined. 


32.2.3 Prolongation of Vector Fields 


The geometrization of a system of DEs makes it possible to use the ma- 
chinery of differentiable manifolds, Lie groups, and Lie algebras to un- 
ravel the symmetries of the system. At the heart of this machinery are 
the infinitesimal transformations, which are directly connected to vector 
fields. Therefore, it is necessary to find out how a vector field defined in 
M CX x U is prolonged. The most natural way to prolong a vector field 
is to prolong its integral curve—which is a one-parameter group of trans- 
formations of M—to a curve in M“ and then calculate the tangent to the 
latter curve. 


nth prolongation ofa Definition 32.2.13 Let M be an open subset of X x U and X € X(M). The 


vector field nth prolongation of X, denoted by pr)X, is a vector field on the nth jet 
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space M“) defined by 
x — 4 pT osptex (n) 
pr leew _ ae [exp( )] : (%, u Thess 
for any (x,u™) e M™, 


Since (x,u™) € M™ form a coordinate system on M, any vector 
field in M” can be written as a linear combination of 0/dx' and 4/ aus 
with coefficients being, in general, functions of all coordinates x! and us. 
For a prolonged vector, however, we have 


P q 
9 a 
or X = > x! al > ys XI 5a (32.14) 


i=l a=l J 


where X! and X, 9 are functions only of x* and u. This is due to the remark 
made in Box 32.2.12. For the same reason, the coefficients Ayn corre- 
sponding to derivatives of order m will be independent of coordinates US) 
for k > m. Thus, it is possible to construct various prolongations of a given 


vector field recursively. 


Example 32.2.14 Let us consider our recurrent example of X = U =R, 
G = SO(2). Given the infinitesimal generator v = —u0, + x0,, one can 
solve the DE of its integral curve to obtain* 


exp(tv)(x, uv) = (x cost — usint, x sint +ucost). 
Example 32.2.11 calculated the first prolongation of SO(2). So 
pr) exp(tv) - (x, u, u1) 


. . sint + u, cost 
= {| xcost —usint, x sint + ucost, ————————_ ]. 


cost — u, sint 


Differentiating the components with respect to ¢ at t = 0 gives 


C) , 

—(x cost — usint) =-u, 

ot 1=0 

0 F 

—(x sint + ucost) =X; 

ot 1=0 

0 fsint +u,cost 

: - = 1+u?. 

dt \ cost — uj Sint / |,_9 


Therefore, 


0 a a 
Oya 4 — —+(1 —— 
cea ae A a 


Note that the first two terms in pry are the same as those in v itself, in 
agreement with Box 32.2.12. 


3One can, of course, also write the finite group element directly. 
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32.3 The Central Theorems 


We are now in a position to state the first central theorem of the application 
of Lie groups to the solution of DEs. This theorem is the exact replica of 
Theorem 32.1.6 in the language of prolongations: 


Theorem 32.3.1 Let {A,(x,u™) = OV be a system of DEs defined on 
M CX x U whose Jacobian matrix 


dA, dA, 
Ja (eu) = ( — ) 
ox! aus 


has rank | for all (x, u™) € 8,. If G is a local Lie group of transformations 
acting on M, and 


pre agg ind) Ov =0, v=1,...,1, whenever Als; u®)) =0 


for every infinitesimal generator & of G, then G is the symmetry group of 
the system. 


Example 32.3.2 Consider the first order (so, n = 1) ordinary DE 
A(x, u, U1) S (u 7 x ~~ u°x)uy +x +x7u + uw = 0, 


so that X = R and U =R. We first note that 
dA dA OA 

JA = ALL 2. a A 
ox ou duy 


3 


4 ((—3x? _ u’)uy +1+4+2xu, (1 — 2ux)u; og By pe ee = u’x), 


which is of rank 1 everywhere. Now let us apply the first prolongation of the 
generator of SO(2)—calculated in Example 32.2.14—to A. We have 


0A JA 0A 
Wei ig ag ae 
Bron) Wx Tea, + ( tu) 


= —u[ (—3x? — u’)uy +1+4+ 2xu] + x[d —2ux)uy, Ag a 3u7 | 


+ (1 + ut) (u a u’x) 


=uy[(u—3° —w?x)uy +x tx7utu | =A. 


It follows that pr v(A) = 0 whenever A = 0, and that SO(2) is a symmetry 
group of the DE. Thus, rotations will change solutions of the DE into other 
solutions. In fact, the reader may verify that in polar coordinates, the DE can 
be written in the incredibly simple form 


d 
ag 
do 


and the symmetry of the DE conveys the fact that adding a constant to 0 
does not change the polar form of the DE. 
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Theorem 32.3.1 reduces the invariance of a system of DEs to a criterion 
involving the prolongation of the infinitesimal generators of the symmetry 
group. The urgent task in front of us is therefore to construct an explicit 
formula for the prolongation of a vector field. In order to gain insight into 
this construction, we first look at the simpler case of U = R and a group G 
that transforms only the independent variables. Furthermore, we restrict our- 
selves to the first prolongation. An infinitesimal generator of such a group 
will be of the form 


which is assumed to act on the space M C X x U. The integral curve of this 
vector field is exp(tv) which acts on points of M as follows: 


(X, @) = exp(tv) - (x, u) = (W(x), u) = (W(x), u). 
By the construction of the integral curves in general, we have 


di (x) 


ee x! (x). (32.15) 


Denote the coordinates of the first jet space M“ by (x!, u, ux), where 
UK if f) = 4f/dx*. By the definition of the action of the prolonged group, 


pr? exp(tv) - (jp f) = (Vix), u, aj), (32.16) 


where u = f(x) and uj = af /dx/. Once we find u;, we can differentiate 
Eq. (32.16) with respect to f at tf = 0 to obtain the prolonged vector field. 
Using Eq. (32.10) and commas to indicate differentiation,* we obtain 


Pp 
@j=fj@=(fo¥p)jO=Vo fiWa@)M, | @ 


i=l = 


Pp 
= val, 


t= 


4We have found it exceedingly convenient to use commas to indicate differentiation with 
respect to the x’s. The alternative, i.e., the use of partials, makes it almost impossible to 
find one’s way in the maze of derivatives involving x', ¥/, and t, with x/ depending on f. 
The reader will recall that the index after the comma is to be thought of as a “position 
holder”, and the argument as a substitution. Thus, for example, 


OF Cissvistp) | OF Ctecnsdp)| OF My 1an5 Pp) 
or; se ~~ Os; a ~ 007; Onk. 


r=X s=X 


fi@= 


1026 32 Lie Groups and Differential Equations 


Since v does not have any component along 0/du, its prolongation will not 
have such components either. The components U; along 0/du; are obtained 
by differentiating u; with respect to f: 


Uj(x,u,uj)= 


d yi (x) 
jaw x 
mt ie tj 1=0 


+ > Uj (v4, n0 =) | 


t=0  j=1 k=l 


ace ——"! (%) 


(32.17) 


The derivative of the first term in the sum can be evaluated as follows: 


i 


ow’. 
= fice 
= (9, ¥( s)) 


s=0 ) 


i 


aw! ow’. 
=: Sid fe 
- Os (*( )) 


= *((0) 


s=0 
i 


=~ Gen) 


i 


Titi. 
- 5 (s, (0) 


s=0 


aw! | 
»)  =- (x) = -X4 (0), 
s=0 as s=0/ ,j , 


s=0 


OW, 


where we have emphasized the dependence of x on ¢ (or s), treated s as 
the first independent variable, and in the second line substituted s = 0 in all 
x’s before differentiation. This is possible because we are taking the partial 
derivative with respect to the first variable holding all others constant. The 
derivative of the second term in Eq. (32.17) can be calculated similarly: 


02x! 
yw! Fp OO) |e 0 = %, jx(FO)) = Oxi axk =0 
because wy} = x!. We therefore have 
Pax! 
Uj(x,u, ug) = “2 i (32.18) 
i= 
and 
P P k 
. 0 0 aX 0 
Dy = p< 8 = : ; 32.19 
pry »( aa tlig-)=yv Lai C9 


It is also instructive to consider the case in which U is still R, but G acts 
only on the dependent variable. Then v = U(x, u)0,, and 


d®, (x, u) 


(%, i) =(x,®,(x,u)), with U(w,u)= a 


#=0 
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The reader may check that in this case, the prolongation of v is given by 


a aU aU 
pry =v+ rl x (x, wu?) — a UK, u?) Gap ir 
, (32.20) 


The second equation in (32.20) can also be written as 


_ aU af a 
-_ du axi axd [U(x, F@)]. 


U;(x, pr f(x) = 


In other words, U; (x, u“)) is obtained from U (x, w) by differentiation with 
respect to x/, while treating u as a function of x. This leads us to the defini- 
tion of the total derivative. 


Definition 32.3.3 Let S: M“ — R be a smooth function of x, u, and all 
derivatives of u up to nth order. The total derivative of S with respect to total derivative 
x!, denoted by D;S, is a smooth function D;S : M+) _, R defined by 


 fs(x, pr foo); 


Dis(it"!f) = 


ie., D;S is obtained from S by differentiating S with respect to x’, treating 
u and all the u’s as functions of x. 


The following proposition, whose proof is a straightforward application 
of the chain rule, gives the explicit formula for calculating the total deriva- 
tive: 


Proposition 32.3.4 The ith total derivative of S: M™ — R is of the form 


ne == >I jaa 


a=l1 J 
where, for J = (j,,--+ dk), 


a k+lia 
= aus 7 onl u 
Ji Oxi ~~ Oxtaxsl... axJk 


and the sum over J includes derivatives of all orders from 0 to n. 


An immediate consequence of this proposition is 


0 a 
Dus =us , = 2 VJ, a, 
taxi (32.21) 


D, (ST) =TD,S + SD;T. 
Higher-order total derivatives are defined in analogy with partial derivatives: 
If J is a multi-index of the form J = (i, ..., iz), then the /th total derivative 
is 


D/S = Dj, Dj, ++ Dj, S. (32.22) 
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As in the case of partial derivatives, the order of differentiation is immaterial. 
We are now ready to state the second central theorem of the application of 
Lie groups to the solution of DEs (for a proof, see [Olve 86, pp. 113—115]). 


Theorem 32.3.5 Let 


ree Waa + ute, Wa 


be a vector field on an open subset M C X x U. The nth prolongation of v, 
ie, pr™v € X(M), is 


q 
) 
pr vy=v+ > ye Ut (x,u™) ui 


a=1 J 


where for J = (ji, .--, dg), the inner sum extends over 1 <|J| <n and the 
coefficients US are given by 


ous 
US (x,u™) = Dz | U% — xe pga s 
F(x,u) = Dy 3 4D x 
and the higher-order derivative Dy is as given in Eq. (32.22). 
Example 32.3.6 Let p = 2, q = 1, and consider the case in which G acts 
only on the independent variables (x, y). The general vector field for this 
situation is 


0 r) 
v=E(x,y)— +7, y)— 
ox dy 


We are interested in the first prolongation of this vector field. Thus, n = 1, 
X!=&, X? =n, and J has only one component, which we denote by j 
(also written as x or y). Theorem 32.3.5 gives 


5. . 
yi tt ax' ou 
eo 3 i ==> 
w= yy ea : are aa. : axs Axi’ 
i= ti — 


and using the notation u, = du/dx and uy = du/dy, we obtain 


0 0 
or yay Oy (32.23) 
Ou “ Oy 
where 
dé an 0& an 
Sis is and ei ae 


In particular, if G = SO(2), so that v= —yd, + xdy, then Uy = —uy and 
Uy = ux. It then follows that 


a 


duy 


0 0 
pry =-y +X 
ox 


3 
oy ee 
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Example 32.3.7 Let p = 1, q = 1, and G = SO(2). The general vector field 
for this situation is 


For the first prolongation of this vector field n = 1, X' = —u, and J has 
only one component, which we denote by x. Theorem 32.3.5 gives 


ax! 
U4 =U, = Dy (x — X!uy) + Xxx =l1- Gols al tu. 


It follows that 


0 0 0 
Oy — 2 
nev = —u— —4+(1 : 
ee ua tag t( + U5) 5 
which is the result obtained in Example 32.2.14. 
The second prolongation can be obtained as well. Once again we use 
Theorem 32.3.5 with obvious change of notation: 


Ux, = Dy Dy (x _ X'ux) + Nog, = D,(1 ey? + uuxx) — UU xxx 
= 3uUyUyy. 


Then 


0 0 0 0 
(2), — 2 
r = —-u— — 1 — +3 —. 
pee tag a cd CEG 
Using Theorem 32.3.1, we note that the DE u,, = 0 has SO(2) as a sym- 
metry group, because with A(x, uv, Ux, Uxx) =Uxx, 
0A 0A 0A JA 
(2) = 2 =e 
rov(A) = —u— +x— + (1 +u,)— + 3u, = 3uxUxx, 
p (A) u ax x Du ( ux) Ou, UxUxx Dies UxUxx 
which vanishes whenever A(x, u, Ux, Uxx) vanishes. This is the statement 
that rotations take straight lines to straight lines. 
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We have all the tools at our disposal to compute (in principle) the most gen- 
eral symmetry group of almost any system of PDEs. The coefficients US of 
the prolonged vector field pr v will be functions of the partial derivatives 
of the coefficients X' and U* of v with respect to both x and wu. The infinites- 
imal criterion of invariance as given in Theorem 32.3.1 will involve x, u, and 
the derivatives of u with respect to x, as well as X’ and U® and their par- 
tial derivatives with respect to x and uw. Using the system of PDEs, we can 
obtain some of the derivatives of the w’s in terms of the others. Substituting 
these relations in the equation of infinitesimal criterion, we get an equation 
involving u’s and powers of its derivatives that are to be treated as indepen- 
dent. We then equate the coefficients of these powers of partial derivatives 
of u to zero. This will result in a large number of elementary PDEs for the 
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coefficient functions X' and U® of v, called the defining equations for the 
symmetry group of the given system of PDEs. In most applications, these 
defining equations can be solved, and the general solution will determine the 
most general infinitesimal symmetry of the system. The symmetry group it- 
self can then be calculated by exponentiation of the vector fields, i.e., by 
finding their integral curves. In the remaining part of this section, we con- 
struct the symmetry groups of the heat and the wave equations. 


32.4.1 The Heat Equation 
The one-dimensional heat equation u; = ux, corresponds to p =2,qg = 1, 
and n = 2. So it is determined by the vanishing of A(x, f, u)) =U; —Uyy. 


The most general infinitesimal generator of symmetry appropriate for this 
equation can be written as 


r) a a 
v=é(x,t,u)— +7t(x,t,u—+¢(,t,u—, (32.24) 
Ox ot ou 


which, as the reader may check (see Problem 32.11), has a second prolon- 
gation of the form 


prOv=v+¢ = +¢' — +o — +o" — +" — ; 
where, for example, 
$= br — Ets + (bu — Tr) — Sulla r — Tull; 
G** = bux + (2bxu — Exx Ux — Texte + (Guu — 2xu)uy 
= yyy lly — Euully — Tully + (bu — 2Ex)Uxx 
— 27, Ux, — BE yUxUyy, — TyUpUyxx — 2TyUxUxt, (32.25) 


and subscripts indicate partial derivatives. Theorem 32.3.1 now gives 
pr v(A) =! — 6** =0 whenever u; = uxx (32.26) 


as the infinitesimal criterion. Substituting (32.25) in (32.26), replacing u; 
with wu, in the resulting equation, and equating to zero the coefficients of 
the monomials in derivatives of u, we obtain a number of equations involv- 
ing €, t, and ¢. These equations as well as the monomials of which they are 
coefficients are given in Table 32.1. Complicated as the defining equations 
may look, they are fairly easy to solve. From (d) and (f) we conclude that 
Tt is a function of t only. Then (c) shows that € is independent of u, and 
(e) gives 2&; = t%, or &(x, t) = 5X + n(t), for some arbitrary function 7. 
From (h) and the fact that € is independent of u we get ¢,,, = 0, or 


P(x,t,u) =a(x,t)u+ B(x, t) 
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Table 32.1 The defining equations of the heat equation and the monomials that give rise 


to them 

Monomial Coefficient equation 

an 0=0 (a) 
ur Uxx Tun =O (b) 
UxUxx 2&, + 2t 4 =0 (c) 
UxUx} 2t, =0 (d) 
Uxx 2& +h, —% =0 (e) 
Uxt 2t, =0 (f) 
uy Sun = 0 (g) 
uy 2é ru — Pun = 0 (h) 
Ux Exx — 2bxu — & = 0 (i) 
1 b: — Oxx =0 G) 


for some as yet undetermined functions a and £. Since & is linear in x, 
&., = 0, and (i) yields & = —2¢,, = —2a, or 


1 1 1 1 
ax = ae qr alt => a(x, 1) = Sta” — 5mx + pO. 


Finally, with @; = a;u + B; and dy, = @y,uU + Byx (recall that when taking 
partial derivatives, u is considered independent of x and f), the last defining 
equation (j) gives a; = a, and 6; = B,x, i.e., that w and £ are to satisfy the 
heat equation. Substituting @ in the heat equation, we obtain 


1 2 1 
ST X” — =X + Pt =——Trr, 


8 2 4 
which must hold for all x and t. Therefore, 


1 
Tit = 0, Nt = 0, aa 


These equations have the solution 


1 
T=cyt? +eot +03, pie 5 Clee eas n=cs5st +c. 


It follows that a(x, t) = peix 5C5X 5cit +c4 and 


1 
E(x, t) = 3 eit ale c2)x +e cst oP C6, 


T(t) =c\t" + cot +¢3, 


1 1 1 
bert) =( rae ar seit tea) + Bla 


Inserting in Eq. (32.24) yields 
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a 


1 0 
=(2cit + c2)x + est + cs + (cit? + cot +03) — 7 


2 


+ ae nme + B(x,t) 2 
pers 70% 501 C4 JU Xx, au 


a ae » 3 a re 
= Aa ney (e) al 
cfr tt 5p a ela ag 


Dd epg teal geet ap | dad By 
Cc C4u Cc xu Cc x,t)—. 
ae ou a ox 2 ou Be Uu 


Thus the Lie algebra of the infinitesimal symmetries of the heat equation is 
spanned by the six vector fields 


Vi = 0x, V2 = 0;, V3 = Udy, V4 = x0x + 2t0;, 
(32.27) 
V5 = 20, —xudu, Vo = 4x, + 41°; — (x? + 21)ud, 


and the infinite-dimensional subalgebra 


vp = B(x, t)0u, 


where f is an arbitrary solution of the heat equation. 

The one-parameter groups G; generated by the v; can be found by solv- 
ing the appropriate DEs for the integral curves. We show a sample calcula- 
tion and leave the rest of the computation to the reader. Consider vs, whose 
integral curve is given by the set of DEs 


dx ae, du _ 
ds ds” ds 


The second equation shows that ¢ is not affected by the group. So, t = fo, 
where fo is the initial value of t. The first equation now gives 


dx 
—=2t > x=2tos+x0, 
ds 


and the last equation yields 


du du —tos*—xos 

Ae => —=-(2test+x)ds => u=uce % ~, 
Ss u 

Changing the transformed coordinates to x! and removing the subscript 

from the initial coordinates, we can write 


exp(v5ss) « (x, t,u) = (%,f, @) = (x + 2ts, ¢, we et), 


Table 32.2 gives the result of the action of exp(v;s) to (x, f, u). 

The symmetry groups G; and G2 reflect the invariance of the heat equa- 
tion under space and time translations. G3 and Gg demonstrate the linearity 
of the heat equation: We can multiply solutions by constants and add solu- 
tions to get new solutions. The scaling symmetry is contained in G4, which 
shows that if you scale time by the square of the scaling of x, you obtain 
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Table 32.2 The transformations caused by the symmetry group of the heat equation 


Group element Transformed coordinates (x, 7, i) 

G, =exp(v15) (x+5,t,u) 

G2 = exp(v25) (x,t+5,u) 

G3 = exp(v35) (x,t, eu) 

G4 = exp(v4s) (ex, et, u) 

Gs = exp(vss) (x + 2ts,t, ue SX) 

Go = exp(vos) Ge ioe uv 1 — 4st exp[ =a ) 
Gp = exp(vgs) (x,t,u+sB(x,t)) 


a new solution. G5 is a Galilean boost to a moving frame. Finally, G6 is a 
transformation that cannot be obtained from any physical principle. Since 
each group G; is a one-parameter group of symmetries, if f is a solution of 
the heat equation, so are the functions f; = G; - f for all i. These functions 
can be obtained from Eq. (32.8). As an illustration, we find f6. First note 
that for u = f(x, t), we have 


z x ~ t 
x = ——__., t= ——_., 
1—4st 1—4st 
Ss oe her a —sx? 
u=fx*,t)= f,twv1—4stexp : 
1—4st 
Next solve the first two equations above for x and f in terms of x and: 
x 4 
x= —_., t= = 
1+4st 1+4st 


Finally, substitute in the last equation to get 


FG =f x t 1 — sx? 
xX. = as = = CX = |> 
[44s 1440) V 144 | aa 


or, changing X to x and ¢ tof, 


fea.) 1 —sx? f x t 
Xx, => ex , . 
° Slade 14st |" \ tease’ 1a 


The other transformed functions can be obtained similarly. We simply list 
these functions: 


fi@,oO=f(x—-s,t), folx,t) = f(x,t—s), 
A. D=8f@.0),  fae,t)= f(e*x,e*), 


fax, =e F(x —25t,, fax. = f(x.) +5B(a,0), 


f(x.) 1 —sx? r( x t ) 
x,t)= ex ; : 
. J1+4st : 1+ 4st 1+4st 1+ 4st 
(32.28) 
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We can find the fundamental solution to the heat equation very simply as 
follows. Let f(x, t) be the trivial constant solution c. Then 


u= f(x, t) = ex" / (+450) 


EET 


is also a solution. Now choose c = 4/s/z and translate ¢ to t — 1/4s (an al- 
lowed operation due to the invariance of the heat equation under time trans- 
lation G2). The result is 


tl oe 40 
which is the fundamental solution of the heat equation [see (22.45)]. 


32.4.2 The Wave Equation 


As the next example of the application of Lie groups to differential equa- 
tions, we consider the wave equation in two dimensions. This equation is 
written as 


Ur —Uxy —Uyy =0, or niujyj=O and A=nlujj, (32.29) 


where 7 = diag(1, —1, —1), and subscripts indicate derivatives with respect 
to coordinate functions x! =t, x2 =x, and x7 = y. With p = 3 andg = 1, 
a typical generator of symmetry will be of the form 


3 


a F) 
= bre, U—, 32.30 
v= d, rues (32.30) 


where {X @ 1 and U are functions of t, x, y, and u to be determined. The 
second prolongation of such a vector field is 


4 y UD(x, 42) 7 


3 
) 
(27 — (i) @)) 
revev+ UN (x, u 
p DUO (xu) — a 


Uj 
i=l '  ijsl 


where by Theorem 32.3.5, 
U%=D a(u -Yox% a) 4 xan = DU} » (D:X)u 
3 3 
yYMe= D;D; (u = Yo xu] a iz Xu 
k=1 k=1 


3 
= D;DjU — )“[(D;DjX™)ug + uixg(DjX™) + u jx(DiX)], 
k=1 
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and we have used Eq. (32.21). Using (32.21) further, the reader may show 
that 


UD = Uj; + ug (3 jin + 5:kU ju — x) 
+ uytty (5:15 jun — Xp 81 — X81) 
+ ux (5113 jxUu — x85) = x i1) 
— UKUlm (Xe 8:18 jm + X15 jx + xX 654531) — ujujupX®, 
(32.31) 


where a sum over repeated indices is understood. 
Applying pr v to A, we obtain the infinitesimal criterion 


UM) =U) 440” or yiiu) =0. 
Multiplying Eq. (32.31) by n‘/ and setting the result equal to zero yields 
o=niuw 
= 9!) Ujj + ug (2n! Vin — 1 XP) + wpa ( Uy — 2X4) 7") 
_ 21 XL” ni! - QuyuimX™ n _ uju jupX™ ep (32.32) 


where we have used the wave equation, nl uxy = 0. Equation (32.32) must 
hold for all derivatives of u and powers thereof (treated as independent) 
modulo the wave equation. Therefore, the coefficients of such “monomials” 
must vanish. For example, since all the terms involving uxuj are indepen- 
dent (even after substituting ux, + Uyy for u;;), we have to conclude that 
x” = 0 for all m, ice. that X are independent of w. Setting the coeffi- 
cient of uxuj equal to zero and noting that x = ax sax = 0 yields 


Uw=90 => Us, y,t,u)=a(x,y,thut+ B(x, y,f). (32.33) 


Let us concentrate on the functions X. These are related via the term 
linear in ux;. After inserting the wave equation in this term, we get 


wg Xp nl! = uyo(Xy — XQ) + ui3(Xp? — X4?) — was (Xz + X$”) 
+ u2o(Xq) — X5?) + wsx(Xq? — X5”), 


The u;; in this equation are all independent; so, we can set their coefficients 
equal to zero: 
(2) __ y() (3) __ y() (3) (2) _ 
Xp =X’, Xp =X3), X; +X; =0, 
(1) (2) (3) a 
X, =X = Xz. 


The reader may verify that these relations imply that X | = 0 for any i, j, 
k, and /. For example, 
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Table 32.3 The generators of the conformal group for R?, part of the symmetry group 
of the wave equation in two dimensions 


Infinitesimal generator Transformation 
Vi] = 0;, V2 = Ox, V3 = Oy Translation 

V4 =X0,; + t0,, V6 = —YO, + XOy, V7 = YO, + tdy Rotation/Boost 
V5 = td, + X0, + yoy Dilatation 

vg = (1? x24 ya; + 2xtd, + 2ytdy —tudy, Inversions 


Vo = 2xtd, + (t? +x? — y*)d, + 2xydy — xUdy, 


Vio = 2ytd, + 2xyd, + (t2 — x24 y*)ay yuoy 


2) _yQ) _ y@) _ Q) _ dq) _ (3) 
X 999 = X19 = X322 = — X393 = — X433 = — 113 
Sa Sa ~ Sa’ 
ee | 
(2) (1) (2) 
= — X9q, = — XQ91 = —Xy90- (32.35) 
Ss’ San 
=x, =X 


So, the first link of this chain is equal to its negative. Therefore, all the third 
derivatives in the chain of Eq. (32.35) vanish. It follows that all X °s are 
mixed polynomials of at most degree two. Writing the most general such 
polynomials for the three functions X“!), ¥®, and X® and having them 
satisfy Eq. (32.34) yields 


X =a + agx tary + ast + ag(x? + y? + 1°) + 2aoxt + 2aroyt, 

XM — a2 + a5x — a6y + dat + ag(x* = y? + i.) + 2ajoxy + 2agxt, 

X®) = a3 + a6x +asy +a7t + ajo(—x? +y?+ 1°) + 2agxy + 2agyt. 
(32.36) 


Setting the coefficient of uz, and n’/ Ui; equal to zero and using 
Eq. (32.33) gives 
2 2 
2a, =XO4+XO— XY, 2ay = X94XO — xP, 


1 
2a, = x? = XO) gt Bit — Bxx — Byy = 0. 


yy? 


It follows that f is any solution of the wave equation, and 
a(x, y,t) =ay1 — agt — agx — ayoy. 


By inserting the expressions found for X and U in (32.30) and writing 
the result in the form )7; a;v;, we discover that the generators of the sym- 
metry group consist of the ten vector fields given in Table 32.3 as well as 
the vector fields 


Udy, vp = B(x, y,t)dy 


for 6 an arbitrary solution of the wave equation. The ten vector fields of Ta- 
ble 32.3 comprise the generators of the conformal group in three dimensions 
whose generalization to m dimensions will be studied in Sect. 37.2. 


32.5 Application to ODEs 


32.5 Application to ODEs 


The theory of Lie groups finds one of its most rewarding applications in the 
integration of ODEs. Lie’s fundamental observation was that if one could 
come up with a sufficiently large group of symmetries of a system of ODEs, 
then one could integrate the system. In this section we outline the general 
technique of solving ODEs once we know their symmetries. The following 
proposition will be useful (see [Warn 83, p. 40]): 


Proposition 32.5.1 Let M be an n-dimensional manifold and v € X(M). 
Assume that v|p 4 0 for some P € M. Then there exists a local chart, i.e., 
local set of coordinate functions, (w',..., w") at P such that v= /dw!. 


32.5.1 First-Order ODEs 


The most general first-order ODE can be written as 


d 
= Feu) => A(x, u,ux) =uy — F(x,u)=0. (32.37) 
x 


A typical infinitesimal generator of the symmetry group of this equation is” 


v= X0, + Ud,, whose prolongation is 
() (x) 9 
pr’ =v+ U~’—.,__ where 
Ouy 
ON =t 4 (Ua— Xolile — ae, (32.38) 
as the reader may verify. The infinitesimal criterion for the one-parameter 


group of transformations G to be a symmetry group of Eq. (32.37) is 
pr‘ y(A) = 0, or 


F*=X egg (32.39) 
du-~é—é XK au" , 


aU dU OX ox 
+ F 
Ox ou Ox 


Any solution (X, U) of this equation generates a 1-parameter group of trans- 
formations. The problem is that a systematic procedure for solving (32.39) 
is more difficult than solving the original equation. However, in most cases, 
one can guess a symmetry transformation (based on physical, or other, 
grounds), and that makes Lie’s method worthwhile. 

Suppose we have found a symmetry group G with infinitesimal generator 
v that does not vanish at P € M C X x U. Based on Proposition 32.5.1, we 
can introduce new coordinates 


w=&(x,u), y=n(x, u) (32.40) 


>The reader is warned against the unfortunate coincidence of notation: X and U represent 
both the components of the infinitesimal generator and the spaces of independent and 
dependent variables! 
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in a neighborhood of P such that v = 0/dw, whose prolongation is also 
d/dw [see (32.38)]. This transforms the DE of (32.37) into® Ay, W,Wy)= 
0, and the infinitesimal criterion into 


Il 
S 


x. 9 
pri) v(A) = = 
ow 


It follows that A is independent of w. The transformed DE is there- 
fore A(y, wy) = 0, whose normal form, obtained by implicitly solving for 
dw/dy, is 


dw y 
Sato) > w= f Hod—we 


dy 
for some function H of y alone and some convenient point y = a. Substi- 
tuting this expression of w as a function of y in Eq. (32.40) and eliminating 
y between the two equations yields u as a function of x. 
Thus our task is to find the change of variables (32.40). For this, we use 
v(w) = | and v(y) = 0, and express them in terms of x and u: 


vw) =v) =X US =I, 
a. ; (32.41) 
V9) = VN) = X— 7 +US = 0. 


The second equation says that 7 is an invariant of the group generated by v. 
We therefore use the associated characteristic ODE [see (32.3) and (32.5)] 
to find y (or 7): 
dx du 
X(x,u) ~ U(x,u) 


(32.42) 


To find w (or €), we introduce x(x,u, v) = v — E(x, u) and note that an 
equivalent relation containing the same information as the first equation in 
(32.41) is 
0 0 0 
xX Xx x Xx 


U =0, 
Ox a ou 7 dv 


which has the characteristic ODE 


dx _ du __ dv 
X(x,u) U(x,u) 1” 


(32.43) 


for which we seek a solution of the form v — €(x, u) = c to read off E(x, u). 

The reader may wonder whether it is sane to go through so much trouble 
only to replace the original single ODE with two ODEs such as (32.42) and 
(32.43)! The answer is that in practice, the latter two DEs are much easier 
to solve than the original ODE. 


Here we are choosing w to be the dependent variable. This choice is a freedom that is 
always available to us. 
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Example 32.5.2 The homogeneous FODE du/dx = F(u/x) is invariant 
under the scaling transformation (x, u) > (sx, su) whose infinitesimal gen- 
erator is V = x0, + u0,. The first prolongation of this vector is the same as 
the vector itself (reader, verify!). 
To find the new coordinates w and y, first use Eq. (32.42) with X (x, u) = 
x and U(x, u) =u: 
dx du 


a x Zak & ge" (osRoxeeis), 
Xx u Xx Xx 


Next, we note that (32.43) yields 


dx du 
—=—=dv > Inuv=v4+ling > v=lInu/c2). 
x u 


Substituting from the previous equation, we obtain 
v=In(c)x/c2) =Inx + In(c1/ce2) => v-Inx=c > w=lInx. 
en 
The chain rule gives du/dx = (1+ ywy)/wy, so that the DE becomes 


Il+yuwy dw 1 
cl a Fi) > —=——, 
Wy dy F(y)-y 
which can be integrated to give w = H(y) or Inx = H(y) = A(u/x), which 
defines u as an implicit function of x. 


32.5.2 Higher-Order ODEs 


The same argument used in the first order ODEs can be used for higher-order 
ODEs to reduce their orders. 


Proposition 32.5.3 Let 


dku 


A(x, u™) = A(xX,U,U4,...,Un)=0, Up= Fak 
x 


be an nth order ODE. If this ODE has a one-parameter symmetry group, 
then there exist variables w = &(x, u) and y = n(x, u) such that 


dw d" 
w ) 0, 


A(x, u®) = Al y, —,..., —— 
ne) (» dy dy" 
i.e., in terms of w and y, the ODE becomes of (n — 1)st order in wy. 

Proof The proof is exactly the same as in the first-order case. The only 


difference is that one has to consider pry, where v = 0/dw. But Prob- 
lem 32.7 shows that pr”) v = v, as in the first-order case. 
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Example 32.5.4 Consider a second-order DE A(u, uy, Uyx) = 0, which 
does not depend on x explicitly. The fact that 0A /dx = 0 suggests w = x. 
So, we switch the dependent and independent variables and write w = x, 
and y = uw. Then, using the chain rule, we get 


du 1 du Wyy 


ade. 1B," dx? w 


Substituting in the original DE, we obtain 


~ 1 Wyy 
AQ, wy. wy) = a(y, ; | =0, 
Wy) w 


which is of first order in wy. 


Example 32.5.5 The order of the SOLDE ux, + p(x)uxy + q(x)u = 0 can 
be reduced by noting that the DE is invariant under the scaling transforma- 
tion (x,u) + (x, su), whose infinitesimal generator is v = wd,. With this 
vector field, Eqs. (32.42) and (32.43) give 


For the first equation to make sense, we have to have 


dx=0 > x=c) => y=x_ (by Box 32.1.8). 


The second equation in u gives 


v=lnute => v—-Inu=c > we=lhu > u=e”. 


Using the chain rule, we obtain 


By inserting this in the original DE and writing z = wy, we obtain 


=—-z2 — p(y)z— qv), 


which is the well-known first-order Riccati equation. 


32.5.3 DEs with Multiparameter Symmetries 


We have seen that 1-parameter symmetries reduce the order of an ODE by 1. 
It is natural to suspect that an r-parameter symmetry will reduce the order 
by r. Although this suspicion is correct, it turns out that in general, one 
cannot reconstruct the solution of the original equation from those of the 


32.5 Application to ODEs 


reduced (n — r)th-order equation. (See [Olve 86, pp. 148-158] for a thor- 
ough discussion of this problem.) However, the special, but important, case 
of second-order DEs is an exception. The deep reason behind this is the ex- 
ceptional structure of 2-dimensional Lie algebras given in Box 29.2.5. We 
cannot afford to go into details of the reasoning, but simply quote the fol- 
lowing important theorem. 


Theorem 32.5.6 Let A(x,u™) = 0 be an nth-order ODE invariant 
under a 2-parameter group. Then there is an (n — 2)nd-order ODE 
A(y, w-?)) = 0 with the property that the general solution to A can be 
found by integrating the general solution to A. In particular, a second-order 
ODE having a 2-parameter group of symmetries can be solved by integra- 
tion. 


Let us analyze the case of a second-order ODE in some detail. By 
Box 29.2.5, the infinitesimal generators vj and v2 satisfy the Lie bracket 
relation 


[vi,Vo]=cvy, c=0 or 1. 


We shall treat the abelian case (c = 0) and leave the nonabelian case for the 
reader. To begin with, we use s and ¢ for the transformed variables, and at 
the end replace them with y and w. 

By Proposition 32.5.1, we can let v} = 0/ds. Then v2 can be expressed 
as the linear combination 


v2 =a(s, N + B(s, De 


The commutation relation [v;, V2] = 0 gives 


0 0 
(sepia pe a, 
Os Os 


showing that a and £ are independent of s. We want to simplify v2 as much 


as possible without changing v;. A transformation that accomplishes this is 
S=s-+h(t) and T = T(t). Then, by Eq. (28.8) we obtain 


0 0 as a oT O 0 
vi=vi( dag t vi )5 


T~sdS Os 0T 9S 


— ys) devo (a 95 4 gS) 4 (3% 4 92%) 2 
V2 = V2( Jag t+ ¥26 op = +B +(a——-+8 ) 


’ 


was |? at aS as | at jar 


a V2 ge 9 
= (a+ Bh’) + BT Ta 


If 6 40, we choose T’ = 1/6 and h’ = —a/f to obtain 


Y=>—, w= >, (32.44) 
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where we have substituted s for S and t for T. If 6 =0, we choose a = T, 
and change the notation from S' to s and T to rf to obtain 


0 
v=—, Y= tae (32.45) 


The next step is to decide which coordinate is the independent variable, 
prolong the vector fields, and apply it to the DE to find the infinitesimal 
criterion. For 6 ¥ 0, the choice is immaterial. So, let w = s and y =f. Then 
the prolongation of v; and v2 will be the same as the vectors themselves, 
and with A(y, w, wy, Wyy) = Wyy — F(y, w, wy), the infinitesimal criteria 
for invariance will be 


OA OF 
0 = pry (A) = vy (A) = — =-—_, 
dw dw 
0A OF 
0 = pr? y2(A) = v2(A) = — =-—_. 
dy dy 


It follows that in the (y, w) system, F will be a function of wy alone and 
the DE will be of the form 


dw Wy dz 
Wyy = F(wy) > —* = F(wy) => / 


dy F(z)” 
———— 
=H (wy) 


~ 


The last equation can be solved for wy in terms of y and the result integrated. 
For 6 = 0, choose w =? and y=s. Then vy will not prolongate, and as 
the reader may verify, 


pr v9 = vo — ws, 


— —3wyw =w w 3wyWyy—— 
yey y yWyy , 
dwWy IWyy dWy IWyy 


and the infinitesimal criteria for invariance will be 


aA OF 
0=prv(A) = vi (A) = — =-—_, 
dy dy 
OF 
0O= pr? vo(A) =—W ay + ws, 3 + 3wy Wyy 
= =F 


=0 
It follows that in the (y, w) system, F will be a function of w and wy and 
satisfy the DE 


OF 


Wy—— =3F, 
dwy 


whose solution is of the form F(w, wy) = wiF (w). The original DE now 
becomes 


Wyy = w,F(w), 


32.6 Problems 1043 


for which we use the chain rule wy, = wydwy/dw to obtain 


dwy. 6% 1 us dw 1 
7 ee (w) => 9 / ()dz => ay Hen 


=H(w) 


which can be integrated. Had we chosen w = s and y =t, F would have 
been a function of y and the DE would have reduced to wyy = F(y), 
which could be solved by two consecutive integrations. The nonabelian 2- 
dimensional Lie algebra can be analyzed similarly. The reader may verify 
that if 6 = 0, the vector fields can be chosen to be 


0 0 
vy=—, vW=s—, (32.46) 
os 
leading to the ODE wyy = wyF (y), and if 6B #0, the vector fields can be 
chosen to be 


a 
vo=s—+t— (32.47) 


as’ ds Ot’ 


leading to the ODE wyy = F (wy)/y. Both of these ODEs are integrable as 
in the abelian case. 


v= 


32.6 Problems 


32.1 Suppose that {F;}/_, are invariants of the PDE (32.3). Show that any 
function f(F, F2,..., Fy) is also an invariant of the PDE. 


32.2 Find the function f =6- f when f(x) =ax +b and @ is the angle of 
rotation of SO(2). 


32.3 Use the result of Problem 32.2 to find u,. Hint: Note that a = uw. 


32.4 Transform the DE of Example 32.3.2 from Cartesian to polar coordi- 
nates to obtain dr/d@ = r?. 


32.5 Using the definition of total derivative, verify Eq. (32.21). 

32.6 Show that SO(2) is a symmetry group of the first-order DE 
A(x, u, uy) = (u—x)u,y +x+u=0 

and write the same DE in polar coordinates. 


32.7 Show that the nth prolongation of the generator of the ith translation, 
0;, is the same as the original vector. 


32.8 Find the first prolongation of the generator of scaling: x0, + Udy. 
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32.9 Show that when the group acts only on the single dependent variable u, 
the prolongation of v = U0, is given by 


Pp 
0 aU aU 

Oy — = 

hev= U;—, h Uj; =— ;—. 
prv’v 2 j ae where ake + uj au 
j= 
32.10 Show that the nth prolongation of v= X (x, u)d, + U(x, u)0, for an 
ordinary DE of nth order is 


where 
() _ ou [k] _ pk (+1) 
ue = ak and UN = D(U — Xux)+ Xu . 
x 


32.11 Compute the second prolongation of the infinitesimal generators of 
the symmetry group of the heat equation. 


32.12 Derive Eqs. (32.31) and (32.32). 
32.13 Using Eq. (32.34) show that X‘) = 0 for any i, j, k, and 1. 


32.14 The Korteweg-de Vries equation is u; + uUxx, + uuxy = 0. Us- 
ing the technique employed in computing the symmetries of the heat and 
wave equations, show that the infinitesimal generators of symmetries of the 
Korteweg-de Vries equation are 


Vi = 0x, Vo = 0;, translation 
V3 toy + Ou ; Galilean boost 


V4 =XxX0y + 3td; — 2ud,. scaling 


32.15 Suppose M(x, u)dx + N(x, u)du =0 has a 1-parameter symmetry 
group with generator v = Xd, + Ud,. Show that the function g(x, uv) = 
1/(XM + UN) is an integrating factor. 


32.16 Show that the second prolongation of v = wd, (with y treated as 
independent variable) is 
Oy av—w? 


a 
— — 3wyWyy —— 
) yWyy ~ 
dwy 


: OWyy 


32.17 Go through the case of 6 = 0 in the solution of the second order ODE 
and, choosing w = s and y =f, show that F will be a function of y alone 
and the original DE will reduce to wy, = F(y). 


32.18 Show that in the case of the nonabelian 2-dimensional Lie algebra, 


32.6 


(a) 


(b) 
(c) 


(d) 
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the vector fields can be chosen to be 


if B =0. 
Show that these vector fields lead to the ODE wyy = wy F (y). 
If B 4 0, show that the vector fields can be chosen to be 
0 0 so 0 
vy=—, v2 =s—+t—. 
as "as | Ot 
Finally, show that the latter vector fields lead to the ODE wyy = 


F (wy)/y. 


Calculus of Variations, Symmetries, 3 3 
and Conservation Laws 


In this chapter we shall start with one of the oldest and most useful branches 
of mathematical physics, the calculus of variations. After giving the funda- 
mentals and some examples, we shall investigate the consequences of sym- 
metries associated with variational problems. The chapter then ends with 
Noether’s theorem, which connects such symmetries with their associated 
conservation laws. All vector spaces of relevance in this chapter will be as- 
sumed to be real. 


33.1. The Calculus of Variations 


One of the main themes of calculus is the extremal problem: Given a func- 
tion f: RD D-—R, find the points in the domain D of f at which f 
attains a maximum or minimum. To locate such points, we find the zeros of 
the derivative of f. For multivariable functions, f : R? > 82 —> R, the no- 
tion of gradient generalizes that of the derivative. To find the jth component 
of the gradient V f, we calculate the difference Af between the value of f 
at (x!,...,x/ +e,...,x?) and its value at (x!,...,x/,...,x?), divide this 
difference by ¢, and take the limit e — 0. This is simply partial differentia- 
tion, and the jth component of the gradient is just the jth partial derivative 


of f. 


33.1.1 Derivative for Hilbert Spaces 


To make contact with the subject of this chapter, let us reinterpret the notion 
of differentiation. The most useful interpretation is geometric. In fact, our 
first encounter with the derivative is geometrical: We are introduced to the 
concept through lines tangent to curves. In this language, the derivative of 
a function f :R D> 2 — R at xo is a line (or function) Ww: 2D 29 > R 
passing through (xo, f(xo)) whose slope is defined to be the derivative of f 
at xo (see Fig. 33.1): 


W(x) = f (xo) + f’ (xo) (« — x0). 
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Bo 


f &o) 


< Q > 


Fig. 33.1 The derivative at (xo, f(x0)) as a linear function passing through the origin 
with a slope f’(xo). The function f is assumed to be defined on a subset 92 of the real line. 
{20 restricts the x’s to be close to xo to prevent the function from misbehaving (blowing 
up), and to make sure that the limit in the definition of derivative makes sense 


The function y(x) describes a line, but it is not a Jinear function (in the 
vector-space sense of the word). The requirement of linearity is due to our 
desire for generalization of differentiation to Hilbert spaces, on which linear 
maps are the most natural objects. Therefore, we consider the line parallel 
to w(x) that passes through the origin. Call this ¢(x). Then 


p(x) = f'(x0)x, (33.1) 


which is indeed a linear function. We identify @(x) as the derivative of f 
at x9. This identification may appear strange at first but, as we shall see 
shortly, is the most convenient and useful. Of course, any identification re- 
quires a one-to-one correspondence between objects identified. It is clear 
that indeed there is a one-to-one correspondence between derivatives at 
points and linear functions with appropriate slopes. 

Equation (33.1) can be used to geometrize the definition of derivative. 
First consider 


, and fo) = jim LO) = FG) 


foje ty 
XO 7X0 X — XO 


x-— 


Next note that, contrary to f which is usually defined only for a subset of 
the real line, @ is defined for all real numbers R, and that @(x — x9) = 
(x) — $(xo) due to the linearity of ¢. Thus, we have 

i f(x) — fo) — $@)-— 60) _ |, O(& — x0) 

im — = lim ———, 


X—> XO xX —X0 xX —X0 X>X0 X—XO 


or 


fim Wf) = £0) = 6 = x0) _ 
im = 


x0 |x — xo| 


0 (33.2) 


where we have introduced absolute values in anticipation of its analogue— 
norm. Equation (33.2) is readily generalized to any complete normed vector 
space (Banach space), and in particular to any Hilbert space: 


33.1 The Calculus of Variations 


Definition 33.1.1 Let 3{; and H be Hilbert spaces with norms || - ||; and 
|| - ll2, respectively. Let f : Hy D 82 — Hz be any map and |xo) € @. Sup- 
pose there is a linear map T € £(5(1, Hz) with the property that 


Il F(x)) — fxo)) — T(x) — |x) ll 


=0 for|x)eE2. 
\|x—xol]1 +0 lx — xoll1 


Then, we say that f is differentiable at |x9), and we define the derivative 
of f at |xo) to be Df (xo) ST. If f is differentiable at each |x) € 92, the 
map 


Df :2—> L(H),H2) givenby Df(|x))=Df(x) 


is called the derivative of /. 
The reader may verify that if the derivative exists, it is unique. 


Example 33.1.2 Let F(; = R” and H2 = R” and f : R” D 2 > R”. Then 
for |x) € 2, Df (x) is a linear map, which can be represented by a matrix in 
the standard bases of IR” and R”. To find this matrix, we need to let Df (x) 
act on the jth standard basis of IR”, i.e., we need to evaluate Df (x)|e;). 
This suggests taking |y) = |x) + hle;) (with h — 0) as the vector appearing 
in the definition of derivative at |x). Then 


f(y) — fx) — Df (x)(y) — |x) lle 
ly —xlh1 
e fal, ...xt thy... x") — fl... a4, ...,x") —hD fle ;) Ilo 


|h| 


approaches zero as h — 0, so that the ith component of the ratio also goes to 
zero. But the ith component of D f (x)|e;) is simply a’, the 7 jth component 
of the matrix of D f(x). Therefore, 


RRC ce cae rep eae Ad Cone REE Sp ae 7 


0, 
h>0 |h| 


which means that ai = afi /ax/. 


The result of the example above can be stated as follows: 


Box 33.1.3 For f :R’ > 2 => R", the matrix of Df (x) in the stan- 
dard basis of R" and R” is the Jacobian matrix of f . 


The case of 3{2 = R deserves special attention. Let 1 be a Hilbert space. 
Then Df (x) € £(H, R) = H* is denoted by d f(x) and renamed the differ- 
ential of f at |x). Furthermore, through the inner product, one can identify 
df : 2 — K* with another map defined as follows: 
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Definition 33.1.4 Let H be a Hilbert space and f : F( > 2 — R. The gra- 
dient V f of f is the map V f : 2 — H defined by 


(Vf (x)la)=(df(x),a) Vx) € 2, |a) eH 
where (,) is the pairing (,) : H* x H > R of H and its dual. 


Note that although f is not an element of H*, df (x) is, for all points 
|x) € 92 at which the differential is defined. 


Example 33.1.5 Consider the function f : H — R given by f(|x)) = 
\|x||?. Since 


lly — xl? = yl? — lel? — 2¢aly — x) 


and since the derivative is unique, the reader may check that d f(x)|a) = 
2(x|a), or V f(|x)) = 2|x). 


Derivatives could be defined in terms of directions as well: 


Definition 33.1.6 Let F{; and Hz be Hilbert spaces. Let f : H, > 2 > 
Hy be any map and |x) € §2. We say that f has a derivative in the direction 
la) € Fy at |x) if 


d 
at A) + tla)) ok 


exists. We call this element of H{2 the directional derivative of f in the 
direction |a) € F{, at |x). 


The reader may verify that if f is differentiable at |x) (in the context of 
Definition 33.1.1), then the directional derivative of f in any direction |a) 
exists at |x) and is given by 


d 
at Me) +tla))| =Df(x)la). (33.3) 
t t=0 


33.1.2 Functional Derivative 


We now specialize to the Hilbert space of square-integrable functions 
£7(Q) for some open subset 2 of some R’”. We need to change our no- 
tation somewhat. Let us agree to denote the elements of £7(Q) by f, u, etc. 
Real-valued functions on £7({2) will be denoted by L, H, etc. The m-tuples 
will be denoted by boldface lowercase letters. To summarize, 


x,yeR”, fuel (Q) => f,u:R™D QR, 
(flu) =f rooueo ans, L,H:£7(2) > R. 
2 


Furthermore, the evaluation of L at u is denoted by L[w]. 


33.1 The Calculus of Variations 


When dealing with the space of functions, the gradient of Definition 
33.1.4 is called a functional derivative or variational derivative and de- 
noted by 6L/du. So 


bL a <i 
(lef Sofood x= Glutifl] | G34) 


where we have used Eq. (33.3). Note that by definition, 5L/du is an ele- 
ment of the Hilbert space L7(2); so, the integral of (33.4) makes sense. 
Equation (33.4) is frequently used to compute functional derivatives. 

An immediate consequence of Eq. (33.4) is the following important re- 
sult. 


Proposition 33.1.7 Let L: £2(2) — R for some 2 CR". If L has an ex- 
tremum at u, then 

a 0 

bua 
Proof Tf L has an extremum at u, then the RHS of (33.4) vanishes for any 
function f, in particular, for any orthonormal basis vector |e;). Complete- 


ness of a basis now implies that the directional derivative must vanish (see 
Proposition 7.1.9). 


Just as in the case of partial derivatives, where some simple relations such 
as derivative of powers and products can be used to differentiate more com- 
plicated expressions, there are some primitive formulas involving functional 
derivatives that are useful in computing other more complicated expressions. 
Consider the evaluation function 


E,:£°(2)>R given by Ey[f]= f(y). 


Using Eq. (33.4), we can easily compute the functional derivative of Ey: 


u(y) + tf (y)} 


5 ai #=0 


dEy[u] dx = d E 
i Fe cosede = TEylu + 1/1) 


J5Ey[u] 
ou 


=/0) => (x)=d(K—y). (33.5) 

It is instructive to compare (33.5) with the similar formula in multivari- 
able calculus, where real-valued functions f take a vector x and give a real 
number. The analogue of the evaluation function is E;, which takes a vector 
x and gives the real number x’, the ith component of x. Using the definition 
of partial derivative, one readily shows that dE; /dx/ = 6;;, which is (some- 
what less precisely) written as 0x! /dx/ = 6;;. The same sort of imprecision 
is used to rewrite Eq. (33.5) as 


duly)  duy _ 
me hie = d6(x—y), (33.6) 
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where we have turned the arguments into indices to make the analogy with 
the discrete case even stronger. 

Another useful formula concerns derivatives of square-integrable func- 
tions. Let Ey ; denote the evaluation of the derivative of functions with re- 
spect to the ith coordinate: 


E,;:£°(2)>R given by Ey j(f) = 0; f(y). 


Then a similar argument as above will show that 


OEy,j ddju(y) 
Eh ad oo ) 
Bu (x) dj5(x—y), 5u(x) djd(k—y), 
and in general, 
50;,...i,U(Y) 
aes, = (-1)*8j,..4,5(K — y). (33.7) 


Equation (33.7) holds only if the function f, the so-called test function, 
vanishes on 02, the boundary of the region of integration. If it does not, 
then there will be a “surface term” that will complicate matters considerably. 
Fortunately, in most applications this surface term is required to vanish. So, 
let us adhere to the convention that 


Box 33.1.8 All test functions f(x) appearing in the integral of 
Eq. (33.4) are assumed to vanish at the boundary of 22. 


For applications, we need to generalize the concept of functions on 
Hilbert spaces. First, it is necessary to consider maps from a Hilbert space to 
R”. For simplicity, we confine ourselves to the Hilbert space £7({2). Such 
a map H: L?(Q) — DCR", for some subset D of R”, can be written in 
components 


H=(H;,H>,...,H,), where H;:£7(2)>R, i=1,...,n. 


Next, we consider an ordinary multivariable function L : R” > D— R, and 
use it to construct a new function on £2(), the composite of L and H: 


LoH:£°(Q)>R, — LoHfu]=L(Hi[u],..., Hn[u]). 


Then the functional derivative of L o H can be obtained using the chain 
rule and noting that the derivative of L is the common partial derivative. It 
follows that 


5LoH[u] 5 “. , OH 
i f= | 5 E(t... Halal) ]}00-= at 5, 33.8) 


where 0; L is the partial derivative of L with respect to its ith argument. 
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Example 33.1.9 Let L: (a,b) x R x R— R, be a function of three vari- 
ables the first one of which is defined for the real interval (a,b). Let 
H; : L(a, b) > R, i = 1, 2, 3, be defined by 


Hj [uJ=x, Ho[u] =E,[u] =u(x), H3[u] = E',[u] =u'(x), 


where E, is the evaluation function and E’. evaluates the derivative. It fol- 
lows that L o H[u] = L(x, u(x), u’(x)). Then, noting that H,[u] is indepen- 
dent of u, we have 
6LoH[u] 6H, [u] 6E,[u] bE’ [u] 
—— (y) = aL (y) + 2L——(y) + 3L— 
ju bu bu éu 
=0+02L8(y — x) — 03L8'(y — x) = 02L5(x — y) 


+ 03L8'(x — y). 


(y) 


This is normally written as 


oL(x, ee u (x)) (y) 2 any y) + ob y), (33.9) 
Uu Ou Ou 


which is the unintegrated version of the classical Euler-Lagrange equation 
for a single particle, to which we shall return shortly. 


A generalization of the example above turns L into a function on 2 x 
R x R” with 2 C R"”, so that 


L(x!,...,%, u(x), d1u(x),..., inu(x)) ER, withx eR”. 
The functions {Hiye"t ' are defined as 


H[u] =x! fori =1,2,...,m, 
H;[u] = Ex[u] = u(x) fori=m+1, 
E 


= 


ju] =Ex i[u] = 0ju(x) fori =m+2,...,2m+1, 


and lead to the equation 


3L oH[u] ile 
——W) = amt 1 LAR y)+ D7) HLS&-y), — GB3.10) 
Mg i=m+2 


which is the unintegrated version of the classical Euler-Lagrange equation 
for a field in m dimensions. 


33.1.3 Variational Problems 
The fundamental theme of the calculus of variations is to find functions 


that extremize an integral and are fixed on the boundary of the integration 
region. A prime example is the determination of the equation of the curve 
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of minimum length in the x y-plane passing through two points (a1, b,) and 
(a2, bz). Such a curve, written as y = u(x), would minimize the integral 


inti = | ¥1+[u'@] ax, u(a,) =b1, u(az) = bo. (33.11) 


Note that int takes a function and gives a real number, i.e.—if we restrict 
our functions to square-integrable ones—int belongs to £7 (a1, az). This is 
how contact is established between the calculus of variations and what we 
have studied so far in this chapter. 

To be as general as possible, we allow the integral to contain derivatives 
up to the nth order. Then, using the notation of the previous chapter, we 
consider functions L on M™ C Q x U™, where we have replaced X with 
Q,sothatM=R?D2xUCRY. 


Definition 33.1.10 By an nth-order variational problem we mean finding 
the extremum of the real-valued function L : £7(2) — R given by 


L[u] =| L(x, u™) d?x, (33.12) 
Q 


where @ is a subset of R?, L is a real-valued function on 2 x U™, and 
p™ =(p+n)!/(n!p!). In this context the function L is called the La- 
grangian of the problem, and L is called a functional.! 


The solution to the variational problem is given by Proposition 33.1.7, 
moving the functional derivative inside the integral, and a straightforward 
(but tedious!) generalization of Eq. (33.10) to include derivatives of order 
higher than one. Due to the presence of the integral, the Dirac delta function 
and all its derivatives will be integrated out. Before stating the solution of 
the variational problem, let us introduce a convenient operator, using the 
total derivative operator introduced in Definition 32.3.3. 


Definition 33.1.11 For 1 <a <q, the ath Euler operator is 


a 
la = (Daa (33.13) 
J we 


where for J = (j1,..., jk); 
(—D) 7 = (—D)*Dy = (—Dj)(—Dy) + (— Dy), 


and the sum extends over all multi-indices J = (ji,..., jx), including 
J=0. 


The negative signs are introduced because of the integration by parts in- 
volved in the evaluation of the derivatives of the delta function. Although the 
sum in Eq. (33.13) extends over all multi-indices, only a finite number of 


'Do not confuse this functional with the linear functional of Chap. 2. 
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terms in the sum will be nonzero, because any function on which the Euler 
operator acts depends on a finite number of derivatives. 


Theorem 33.1.12 [f u is an extremal of the variational problem (33.12), 


then it must be a solution of the Euler-Lagrange equations Euler-Lagrange 
equations 
; OL 
a(L) =) \(-D); ==0, e=1,...,¢: 
aus 


J 


For the special case of p = g = 1, the Euler operator becomes 


oe 9 a a a 
_ Y D,)/ = D D2 
ou a an ») du; du “ Ouy - * Uy 


where D, is the total derivative with respect to x, and uw; is the jth derivative 
of u with respect to x; and the Euler-Lagrange equation for the variational 


problem 
b 
Liul= | L(x u™) dx 
ai 
becomes 
ab < OL 
i(L) = 1)/ Di =0. 33.14 
O50 Oa, (33.14) 


j=l 
Since L carries derivatives up to the n-th order and each D, carries one 
derivative, we conclude that Eq. (33.14) is a 2n-th order ODE. 


Example 33.1.13 The variational problem of Eq. (33.11) has a Lagrangian 


L(u, u) a L(u, u\?) =,/1+u2, 


which is a function of the first derivative only. So, the Euler-Lagrange equa- 
tion takes the form 


0=—p, 2% =—p,( Uy )- =( Ux )- Uxx 
Oe NTRP dx iu) +g)?” 


Or Uxx = 0, so that u = f(x) =c\x + c2. The solution to the variational 
problem is a straight line passing through the two points (a;,b,) and 


(a2, b2). 


Historical Notes 

Leonhard Euler (1707-1783) was Switzerland’s foremost scientist and one of the three 
greatest mathematicians of modern times (Gauss and Riemann being the other two). He 
was perhaps the most prolific author of all time in any field. From 1727 to 1783 his 
writings poured out in a seemingly endless flood, constantly adding knowledge to every 
known branch of pure and applied mathematics, and also to many that were not known 
until he created them. He averaged about 800 printed pages a year throughout his long 
life, and yet he almost always had something worthwhile to say. The publication of his 
complete works was started in 1911, and the end is not in sight. This edition was planned 
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to include 887 titles in 72 volumes, but since that time extensive new deposits of pre- 
viously unknown manuscripts have been unearthed, and it is now estimated that more 
than 100 large volumes will be required for completion of the project. Euler evidently 
wrote mathematics with the ease and fluency of a skilled speaker discoursing on subjects 
with which he is intimately familiar. His writings are models of relaxed clarity. He never 
condensed, and he reveled in the rich abundance of his ideas and the vast scope of his 
interests. The French physicist Arago, in speaking of Euler’s incomparable mathematical 
facility, remarked that “He calculated without apparent effort, as men breathe, or as eagles 
sustain themselves in the wind.” He suffered total blindness during the last 17 years of his 
life, but with the aid of his powerful memory and fertile imagination, and with assistants 
to write his books and scientific papers from dictation, he actually increased his already 
prodigious output of work. 

Euler was a native of Basel and a student of Johann Bernoulli at the University, but he 
soon outstripped his teacher. He was also a man of broad culture, well versed in the clas- 
sical languages and literatures (he knew the Aeneid by heart), many modern languages, 
physiology, medicine, botany, geography, and the entire body of physical science as it 
was known in his time. His personal life was as placid and uneventful as is possible for a 
man with 13 children. 

Though he was not himself a teacher, Euler has had a deeper influence on the teach- 
ing of mathematics than any other person. This came about chiefly through his three 
great treatises: Introductio in Analysin Infinitorum (1748); Institutiones Calculi Differ- 
entialis (1755); and Institutiones Calculi Integralis (1768-1794). There is considerable 
truth in the old saying that all elementary and advanced calculus textbooks since 1748 
are essentially copies of Euler or copies of copies of Euler. These works summed up 
and codified the discoveries of his predecessors, and are full of Euler’s own ideas. He 
extended and perfected plane and solid analytic geometry, introduced the analytic ap- 
proach to trigonometry, and was responsible for the modern treatment of the functions 
Inx and e*. He created a consistent theory of logarithms of negative and imaginary num- 
bers, and discovered that Inx has an infinite number of values. It was through his work 
that the symbols e, 2, and i = ./—I1 became common currency for all mathematicians, 
and it was he who linked them together in the astonishing relation e‘* = —1. Among his 
other contributions to standard mathematical notation were sinx, cosx, the use of f(x) 
for an unspecified function, and the use of )> for summation. 

His work in all departments of analysis strongly influenced the further development of this 
subject through the next two centuries. He contributed many important ideas to differen- 
tial equations, including substantial parts of the theory of second-order linear equations 
and the method of solution by power series. He gave the first systematic discussion of the 
calculus of variations, which he founded on his basic differential equation for a minimiz- 
ing curve. He discovered the integral defining the gamma function and developed many 
of its applications and special properties. He also worked with Fourier series, encountered 
the Bessel functions in his study of the vibrations of a stretched circular membrane, and 
applied Laplace transforms to solve differential equations—all before Fourier, Bessel, 
and Laplace were born. 

E.T. Bell, the well-known historian of mathematics, observed that “One of the most re- 
markable features of Euler’s universal genius was its equal strength in both of the main 
currents of mathematics, the continuous and the discrete.” In the realm of the discrete, he 
was one of the originators of number theory and made many far-reaching contributions to 
this subject throughout his life. In addition, the origins of topology—one of the dominant 
forces in modern mathematics—lie in his solution of the Konigsberg bridge problem and 
his formula V — E + F = 2 connecting the numbers of vertices, edges, and faces of a 
simple polyhedron. 

The distinction between pure and applied mathematics did not exist in Euler’s day, and 
for him the entire physical universe was a convenient object whose diverse phenomena 
offered scope for his methods of analysis. The foundations of classical mechanics had 
been laid down by Newton, but Euler was the principal architect. In his treatise of 1736 
he was the first to explicitly introduce the concept of a mass-point, or particle, and he was 
also the first to study the acceleration of a particle moving along any curve and to use the 
notion of a vector in connection with velocity and acceleration. His continued successes 
in mathematical physics were so numerous, and his influence was so pervasive, that most 
of his discoveries are not credited to him at all and are taken for granted in the physics 
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community as part of the natural order of things. However, we do have Euler’s angles for 
the rotation of a rigid body, and the all-important Euler-Lagrange equation of variational 
dynamics. 

Euler was the Shakespeare of mathematics—universal, richly detailed, and inexhaustible. 


The variational problem is a problem involving only the first functional 
derivative, or the first variation. We know from calculus that the first deriva- 
tive by itself cannot determine the nature of the extremum. To test whether 
the point in question is maximum or minimum, we need all the second 
derivatives (see Example 6.6.9). One uses these derivatives to expand the 
functional in a Taylor series up to the second order. The sign of the second 
order contribution determines whether the functional is maximum or mini- 
mum at the extremal point. In analogy with Example 6.6.9, we expand L[u] 
about f up to the second-order derivative: 


teers : d?y (u(y) — F)) 


u=f 


Le |" sam 
d? 
Y Su@yauty) 


The integrals have replaced the sums of the discrete case of Taylor expansion 
of the multivariable functions. Since we are interested in comparing u with 
the f that extremizes the functional, the second term vanishes and we get 


L[u] = 1t5 fh d’y [ars oe 
” Faqauly) wee 


-[(u(y) — Fy) (u(y’) — F(y))]- (33.15) 


du(y) 


v=f 


Historical Notes 

Joseph Louis Lagrange (1736-1813) was born Giuseppe Luigi Lagrangia but adopted 
the French version of his name. He was the eldest of eleven children, most of whom did 
not reach adulthood. His father destined him for the law—a profession that one of his 
brothers later pursued—and Lagrange offered no objections. But having begun the study 
of physics and geometry, he quickly became aware of his talents and henceforth devoted 
himself to the exact sciences. Attracted first by geometry, at the age of seventeen he turned 
to analysis, then a rapidly developing field. 

In 1755, in a letter to the geometer Giulio da Fagnano, Lagrange speaks of one of Euler’s 
papers published at Lausanne and Geneva in 1744. The same letter shows that as early 
as the end of 1754 Lagrange had found interesting results in this area, which was to 
become the calculus of variations (a term coined by Euler in 1766). In the same year, 
Lagrange sent Euler a summary, written in Latin, of the purely analytical method that 
he used for this type of problem. Euler replied to Lagrange that he was very interested 
in the technique. Lagrange’s merit was likewise recognized in Turin; and he was named, 
by a royal decree, professor at the Royal Artillery School with an annual salary of 250 
crowns—a sum never increased in all the years he remained in his native country. Many 
years later, in a letter to d’‘Alembert, Lagrange confirmed that this method of maxima and 
minima was the first fruit of his studies—he was only nineteen when he devised it—and 
that he regarded it as his best work in mathematics. In 1756, in a letter to Euler that has 
been lost, Lagrange, applying the calculus of variations to mechanics, generalized Euler’s 
earlier work on the trajectory described by a material point subject to the influence of 
central forces to an arbitrary system of bodies, and derived from it a procedure for solving 
all the problems of dynamics. 


- u(y) — f (y))(u(y’) — f(y’). 
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In 1757 some young Turin scientists, among them Lagrange, founded a scientific soci- 
ety that was the origin of the Royal Academy of Sciences of Turin. One of the main 
goals of this society was the publication of a miscellany in French and Latin, Miscellanea 
Taurinensia ou Mélanges de Turin, to which Lagrange contributed fundamentally. These 
contributions included works on the calculus of variations, probability, vibrating strings, 
and the principle of least action. 

To enter a competition for a prize, in 1763 Lagrange sent to the Paris Academy of Sci- 
ences a memoir in which he provided a satisfactory explanation of the translational motion 
of the moon. In the meantime, the Marquis Caraccioli, ambassador from the kingdom of 
Naples to the court of Turin, was transferred by his government to London. He took along 
the young Lagrange, who until then seems never to have left the immediate vicinity of 
Turin. Lagrange was warmly received in Paris, where he had been preceded by his mem- 
oir on lunar libration. He may perhaps have been treated too well in the Paris scientific 
community, where austerity was not a leading virtue. Being of a delicate constitution, 
Lagrange fell ill and had to interrupt his trip. In the spring of 1765 Lagrange returned to 
Turin by way of Geneva. 

In the autumn of 1765 d’ Alembert, who was on excellent terms with Frederick II of Prus- 
sia, and familiar with Lagrange’s work through Mélanges de Turin, suggested to Lagrange 
that he accept the vacant position in Berlin created by Euler’s departure for St. Peters- 
burg. It seems quite likely that Lagrange would gladly have remained in Turin had the 
court of Turin been willing to improve his material and scientific situation. On 26 April, 
d’Alembert transmitted to Lagrange the very precise and advantageous propositions of 
the king of Prussia. Lagrange accepted the proposals of the Prussian king and, not with- 
out difficulties, obtained his leave through the intercession of Frederick I with the king 
of Sardinia. Eleven months after his arrival in Berlin, Lagrange married his cousin Vitto- 
ria Conti who died in 1783 after a long illness. With the death of Frederick II in August 
1786 he also lost his strongest support in Berlin. Advised of the situation, the princes of 
Italy zealously competed in attracting him to their courts. In the meantime the French 
government decided to bring Lagrange to Paris through an advantageous offer. Of all the 
candidates, Paris was victorious. 

Lagrange left Berlin on 18 May 1787 to become pensionnaire vétéran of the Paris 
Academy of Sciences, of which he had been a foreign associate member since 1772. 
Warmly welcomed in Paris, he experienced a certain lassitude and did not immediately 
resume his research. Yet he astonished those around him by his extensive knowledge of 
metaphysics, history, religion, linguistics, medicine, and botany. 

In 1792 Lagrange married the daughter of his colleague at the Academy, the astronomer 
Pierre Charles Le Monnier. This was a troubled period, about a year after the flight of 
the king and his arrest at Varennes. Nevertheless, on 3 June the royal family signed the 
matriage contract “as a sign of its agreement to the union.” Lagrange had no children 
from this second marriage, which, like the first, was a happy one. 

When the academy was suppressed in 1793, many noted scientists, including Lavoisier, 
Laplace, and Coulomb were purged from its membership; but Lagrange remained as its 
chairman. For the next ten years, Lagrange survived the turmoil of the aftermath of the 
French Revolution, but by March of 1813, he became seriously ill. He died on the morning 
of 11 April 1813, and three days later his body was carried to the Panthéon. The funeral 
oration was given by Laplace in the name of the Senate. 


Example 33.1.14 Let us apply Eq. (33.15)to the extremal function of Ex- 


is indeed the shortest ample 33.1.13 to see if the line is truly the shortest distance between two 


distance between two 
points. 


points. The first functional derivative, obtained using Eq. (33.9), is simply 
U(L): 


éL Egy Uyy 
duly) (23/2 
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To find the second variational derivative, we use the basic relations (33.6), 
(33.7), and the chain rule (33.10): 


iL 
du(y’)du(y) 


u=f 


_ 5 Uyy 
~ buy’) iw) u=f 


2)-3/2_SUyy 3 2\-5/24,, _Sity 
-{t Fe) Roy EF) GN | 


u=f 
_ & yy’) 
up (L+.c7)3/2’ 


a"(y—y') 
(1+ u2)372 


because uyy = 0 and uy = c; when u = f. Inserting this in Eq. (33.15), we 
obtain 


1 
20 +.2)3/2 


x [ dy i, * dy'8"(y —y’\(u) — FO))(uly’) — £0) 


a 
re (u(y) — f(y)). 


L[u] =L[f] 


1 a2. 
=U) span [dru - F00) 


The last integral can be integrated by parts, with the result 


d a2 a2 d 2 
(u(y) LOT (u(y) — fQ)) La a>] £ Woy - £09)| 


A 
=0 because u(aj;) = f(ai),i = 1,2 


Therefore, 


1 2 Td : 
Lu =Ul+ same | a>] £ (wor - F09)] . 
—_—_—_—_—_———— 


a 
always positive 


It follows that L[ f] < L[w], i.e., that f indeed gives the shortest distance. 


Example 33.1.15 In the special theory of relativity, the element of the in- 
variant “length”, or proper time, is given by Vdt? — dx2. Thus, the total 
proper time between two events (f;, a1) and (f2, az) is given by 


7) dx 
uxi= / Lad. 2=—, 
ty dt 


The extremum of this variational problem is exactly the same as in the previ- 
ous example, the only difference being a sign. In fact, the reader may verify 
that 
6L[x] _ x 
=E(L) = 7a 
dx(s) (1 — x2)3/ 
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and therefore, x = f(t) = cyt + co extremizes the functional. The second 
variational derivative can be obtained as before. It is left for the reader to 
show that in the case at hand, L[f] > L[x], ie., that f gives the longest 
proper time. Since the function f(t) = c,f + cz corresponds to an inertial 
(unaccelerated) observer, we conclude that 


Box 33.1.16 Accelerated observers measure a shorter proper time 
between any two events than inertial observers. 


This is the content of the famous twin paradox, in which the twin who 
goes to a distant galaxy and comes back (therefore being accelerated) will 
return younger than her (unaccelerated) twin. 


33.1.4 Divergence and Null Lagrangians 


The variational problem integrates a Lagrangian over a region 2 of R?. 
If the Lagrangian happens to be the divergence of a function that vanishes 
at the boundary of £2, the variational problem becomes trivial, because all 
functions will extremize the functional. We now study such Lagrangians in 
more detail. 


Definition 33.1.17 Let {F; : M (n) _, R}i_1 be functions on M™, and F = 
(F\,..., Fp). The total divergence of F is defined to be* 


Pp 
D-F=)_DjFj, 
j=l 


where D; is the total derivative with respect to x, 


Now suppose that the Lagrangian L(x, u)) can be written as the diver- 
gence of some p-tuple F. Then by the divergence theorem, 


Liul= | L(x.u)a?x= | D-Fa?s = | F-da 
2 7) 02 


for any u = f(x) and any domain 2. It follows that L[ f] depends on the 
behavior of f only at the boundary. Since in a typical problem no variation 
takes place at the boundary, all functions that satisfy the boundary condi- 
tions will be solutions of the variational problem, i.e., they satisfy the Euler- 
Lagrange equation. Lagrangians that satisfy the Euler-Lagrange equation for 
all u and x are called null Lagrangians. It turns out that null Lagrangians 
are the only such solutions of the Euler-Lagrange equation (for a proof, see 
[Olve 86, pp. 252—253]). 


The reader need not be concerned about lack of consistency in the location of indices 
(upper vs. lower), because we are dealing with indexed objects, such as F;, which are not 
tensors! 
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Theorem 33.1.18 A function L(x, u™) satisfies E(L) = 0 for all x and u 
if and only if L=D.-F for some p-tuple of functions F = (F),..., Fy) of x, 
u, and the derivatives of u. 


In preparation for the investigation of symmetries of the variational prob- 
lems, we look into the effect of a change of variables on the variational 
problem and the Euler operator. This is important, because the variational 
problem should be independent of the variables chosen. Let 


x=W(x,u), u= P(x, u) (33.16) 


be any change of variables. Then by prolongation, we also have #”) = 
(x, u™) for the derivatives. Substituting wu = f(x) and all its prolon- 
gations in terms of the new variables, the functional 


LU f= L L(x, pr™ f(x)) d?x 
will be transformed into 

Lifl= [ L(x, pr f(®) d?x, 
where the transformed domain, defined by 

Q={F =W(x, f(x)) |x € Q}, 


will depend not only on the original domain £2, but also on the function /f. 
The new Lagrangian is then related to the old one by the change of variables 
formula for multiple integrals: 


L(x, pr™ f(x) = L(&, pe f@)) det (x, pr F(x), (33.17) 


where J is the Jacobian matrix of the change of variables induced by the 
function f. 

Starting with Eqs. (33.16) and (33.17), one can obtain the transformation 
formula for the Euler operator stated below. The details can be found in 
[Olve 86, pp. 254-255]. 


Theorem 33.1.19 Let L(x,u™) and L(x,a™) be two Lagrangians re- 
lated by the change of variable formulas (33.16) and (33.17). Then 


q 
ty (L) = > Fog(x,uP)Eg(L), a=1,...,4 
p=1 


where E p is the Euler operator associated with the new variables, and 
DW! ... Dy! aw! /au% 
Fup = det 


DWP... DyWP aw? /au% 
D,®P ... Dy? abP /aue 
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33.2 Symmetry Groups of Variational Problems 


In the theory of fields, as well as in mechanics, condensed matter theory, and 
statistical mechanics, the starting point is usually a Lagrangian. The varia- 
tional problem of this Lagrangian gives the classical equations of motion, 
and its symmetries lead to the important conservation laws. 


Definition 33.2.1 A local group of transformations G acting on M C {29 x 
U is a variational symmetry group of the functional 


L[w] -| L(x, u®) d?x (33.18) 
20 


if whenever (the closure of) £2 lies in (29, f is a function over {2 whose 
graph is in M, and g € G is such that f = g- f is a single-valued function 
defined over §2, then 


i L(&, pr f(%)) d?x = L(x, pr™ f (x)) d? x. (33.19) 
QQ Q 


In the physics community, the symmetry group of the variational problem 
is (somewhat erroneously) called the symmetry of the Lagrangian. Note 
that if we had used L in the LHS of Eq. (33.19), we would have obtained 
an identity valid for all Lagrangians because of Eq. (33.17) and the for- 
mula for the change in the volume element of integration. Only symmetric 
Lagrangians will satisfy Eq. (33.19). 

As we have experienced so far, the action of a group can be very compli- 
cated and very nonlinear. On the other hand, the infinitesimal action simpli- 
fies the problem considerably. Fortunately, we have (see [Olve 86, pp. 257— 
258] for a proof). 


Theorem 33.2.2 A local group of transformations G acting on M C Qo x 
U is a variational symmetry group of the functional (33.18) if and only if 


pr v(L) + LD-X=0 (33.20) 


for all (x,u™) € M™ and every infinitesimal generator 


De Waa tut Way 


of G, where X = (X!,..., X?). 
Example 33.2.3 Consider the case of p = 1 = q, and assume that the La- 


grangian is independent of x but depends on u € £7(a, b) and its first deriva- 
tive. Then the variational problem takes the form 


b b 
uinl= | Lu) dx= | Lu, uy) dx. 


33.2 Symmetry Groups of Variational Problems 


Since derivatives are independent of translations, we expect translations to 
be part of the symmetry group of this variational problem. Let us verify this. 
The infinitesimal generator of translation is 0,, which is its own prolonga- 
tion. Therefore, with X = 1 and U = 0, it follows that 


pr v(L) + LD-X=0,L+LD,X =0+0=0. 


Example 33.2.4 Asa less trivial case, consider the proper time of Example 
33.1.15. Lorentz transformations generated by? v = ud, + x0, are symme- 
tries of that variational problem. We can verify this by noting that the first 
prolongation of v is, as the reader is urged to verify, 


0 
pry =v+(1—u2) aus 


Therefore, 


1 1 
pr v(L) =0+0+4 (I ux))5( a eerie Ux /1—u2. 
ae 


On the other hand, since X = u and U = x, 


LD,(X) =,/1— u2 Dx (u) =,/1l- u2ux, 


so that Eq. (33.20) is satisfied. 


In the last chapter, we studied the symmetries of the DEs in some detail. 
This chapter introduces us to a particular DE that arises from a variational 
problem, namely, the Euler-Lagrange equation. The natural question to ask 
now is: How does the variational symmetry manifest itself in the Euler- 
Lagrange equation? Barring some technical difficulties, we note that for any 
change of variables, if u = f(x) is an extremal of the variational problem 
L[u], then u = f (x) is an extremal of the variational problem L[a]. In par- 
ticular, if the change is achieved by the action of the variational symmetry 
group, (x, “) = g- (x, u) for some g € G, then L[a#] = L[a], and g: f isalso 
an extremal of L. We thus have 


Theorem 33.2.5 If G is the variational symmetry group of a functional, 
then G is also the symmetry group of the associated Euler-Lagrange equa- 
tions. 


The converse is not true! There are symmetry groups of the Euler- 


Lagrange equations that are not the symmetry group of the variational prob- 
lem. Problem 33.8 illustrates this for p = 3, g = 1, and the functional 


L[w] = 5 [fe — ut, —u5) dx dy dt, (33.21) 


3In order to avoid confusion in applying formula (33.20), we use x (instead of f) as the 
independent variable and u (instead of x) as the dependent variable. 


1063 


1064 


Symmetries of the 
Euler-Lagrange 
equations are not 
necessarily the 
symmetries of the 
corresponding 
variational problem! 


Lagrange multiplier 


33 Calculus of Variations, Symmetries, and Conservation Laws 


whose Euler-Lagrange equation is the wave equation. The reader is asked 
to show that while the rotations and Lorentz boosts of Table 32.3 are vari- 
ational symmetries, the dilatations and inversions (special conformal trans- 
formations) are not. 

We now treat the case of p = 1 = g, whose Euler-Lagrange equation is 
an ODE. Recall that the knowledge of a symmetry group of an ODE led to 
a reduction in the order of that ODE. Let us see what happens in the present 
case. Suppose v = X0, + U0, is the infinitesimal generator of a 1-parameter 
group of variational symmetries of L. By an appropriate coordinate transfor- 
mation from (x, u) to (y, w), as in Sect. 32.5, v will reduce to 0/dw, whose 
prolongation is also d/dw. In terms of the new coordinates, Eq. (33.20) will 
reduce to aL /dw = 0; Le., the new Lagrangian is independent of w, and the 
Euler-Lagrange equation (33.14) becomes 


; n 7 n—1 aL 
0=E(L)=) \(-1)/D; roa py |Se-n. 


j=l Bye 


| (33.22) 


j=0 


Therefore, the expression in the brackets is some constant A (because Dy isa 
total derivative). Furthermore, if we introduce v = wy, the expression in the 
brackets becomes the Euler-Lagrange equation of the variational problem 


A 


L[v] = / L(y, yD) dy, where L(y, yr) — Ly, Wy,.-., Wn), 


and every solution w = f(y) of the original (27)th-order Euler-Lagrange 
equation corresponds to the (2m — 2)nd-order equation 


sik, Ob. os one 
i(L) = D,)/ =k. 33.23 
(L) 5 pe ‘ 30, (33.23) 


Moreover, this equation can be written as the Euler-Lagrange equation for 


Ly fv] a [o. y*—)) = dv] dy, 


and A can be thought of as a Lagrange multiplier, so that in analogy with 
the multivariable extremal problem,* the extremization of Lif] becomes 
equivalent to that of Cv] subject to the constraint { vdy = 0. We summarize 
the foregoing discussion in the following theorem. 


Theorem 33.2.6 Let p= 1 =4q,andL{[u] an nth-order variational problem 
with a \-parameter group of variational symmetries G. Then there exists a 
one-parameter family of variational problems L[v] of order n — 1 such 
that every solution of the Euler-Lagrange equation for L[u] can be found by 
integrating the solutions to the Euler-Lagrange equation for Li [vl]. 


4See [Math 70, pp. 331-341] for a discussion of Lagrange multipliers and their use 
in variational techniques, especially those used in approximating solutions of the 
Schrédinger equation. 


33.3. Conservation Laws and Noether’s Theorem 


Thus, we have the following important result: 


Box 33.2.7 A 1-parameter variational symmetry of a functional re- 
duces the order of the corresponding Euler-Lagrange equation by 
two. 


This conclusion is to be contrasted with the symmetry of ODEs, where 
each 1-parameter group of symmetries reduces the order of the ODE by 1. 
It follows from Box 33.2.7 that the ODEs of order 2n derived from a varia- 
tional problem—the Euler-Lagrange equation—are special. 


Example 33.2.8 A first-order variational problem with a 1-parameter 
group of symmetries can be integrated out. By transforming to a new coor- 
dinate system, we can always assume that the Lagrangian is independent of 
the dependent variable (see Proposition 32.5.1). The Euler-Lagrange equa- 
tion in this case becomes 


OL OL OL 
0=E(L) = D, > (X,Ux) =A. 
ou Oux Oux 
Sa 
=0 


Solving this implicit relation, we get uy = F(x, 4), which can be integrated 
to give u as a function of x (and A). 


The procedure can be generalized to r-parameter symmetry groups, but 
the order cannot be expected to be reduced by 2 unless the group is abelian. 
We shall not pursue this matter here, but ask the reader to refer to Prob- 
lem 33.9. 


33.3 Conservation Laws and Noether’s Theorem 


A conserved physical quantity is generally defined as a quantity whose flux 
through any arbitrary closed surface is equal to (the negative of) the rate of 
depletion of the quantity in the volume enclosed. This statement, through the 
use of the divergence theorem, translates into a relation connecting the time 
rate of change of the density and the divergence of the current corresponding 
to the physical quantity. Treating time and space coordinates as independent 
variables and extending to p independent variables, we have the following: 


Definition 33.3.1 A conservation law for a system of differential equa- 
tions A(x, u™) = 0 is a divergence expression D - J = 0 valid for all solu- 
tions u = f (x) of the system. Here, 


J= (1 (x, u™), Jo(x, u™), suse Tals u™)) 


is called current density. 


current density and 
conservation law 
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For p= 1=q,ie., for a system of ODEs, a conservation law takes the 
form D,J(x,u™) = 0 for all solutions uv = f(x) of the system. This re- 
quires J(x,u) to be a constant, i.e., that J(x,u) be a constant of the 
motion, or, as it is sometimes called, the first integral of the system. 

In order to understand conservation laws, we need to get a handle on 
those conservation laws that are trivially satisfied. 


Definition 33.3.2 If the current density J itself vanishes for all solutions 
u = f(x) of the system A(x, u™) = 0, then D- J = 0 is called a trivial 
conservation law of the first kind. 


To eliminate this kind of triviality, one solves the system and its prolon- 
gations A“ (x, u™) = 0 for some of the variables us in terms of the re- 
maining variables and substitutes the latter whenever they occur. For exam- 
ple, one can differentiate the evolution equation u,; = F(x, u))—in which 
u have derivatives with respect to x only—with respect to t and x suffi- 
cient number of times (this is what is meant by “prolongation” of the system 
of equations) and solve for all derivatives of u involving time. Then, in the 
conservation law, substitute for any such derivatives to obtain a conservation 
law involving only x derivatives of uw. 


Example 33.3.3 The current density J; = (Zu? + sur, —u;Ux) is easily 


seen to be conserved for the system of first-order DEs 
Ur = Vx, ux = Up. 


By eliminating all the time derivatives in Jj, we obtain Jo = (Zur + 
5u2, —u,vx), Which is also conserved. However, the difference between 


these two currents, 


iA i 
2 


2 
Uy, Ux Vx — in) , 


satisfies a trivial conservation law of the first kind, because the components 
of J vanish on the solutions of the system. 


Definition 33.3.4 Ifthe current density J satisfies D- J = 0 for all functions 
u = f (x), even if they are not solutions of the system of DEs, the divergence 
identity is called a trivial conservation law of the second kind. In this case 
J is called a null divergence. 


If we treat J; as the components of a (p — 1)-form @, so that the exterior 
derivative dw is the divergence of J (times a volume element), then the 
triviality of the conservation law for J is equivalent to the fact that w is 
closed. By the converse of the Poincaré lemma, there must be a (p — 2)-form 
n such that @ = dn. In the context of this chapter, we have the following 
theorem. 


33.3. Conservation Laws and Noether’s Theorem 


Theorem 33.3.5 Suppose J = (J\(x,u™),..., Jp, u™)) is a p-tuple of 
functions on X x U™, Then J is a null divergence if and only if there exist 
smooth functions Axj(x, u™), j,k =1,..., p, antisymmetric in their in- 
dices, such that 


Pp 
=a y_ Aye Glee: (33.24) 
j=l 


Definition 33.3.6 We say that D- J = 0 is a trivial conservation law 
if there exist antisymmetric smooth functions Ax; (x,u™) satisfying 
Eq. (33.24) for all solutions of the system of DEs A(x, u”) = 0. Two 
conservation laws are equivalent if they differ by a trivial conservation law. 


We shall not distinguish between conservation laws that are equivalent. 
It turns out that to within this equivalence, some systems of DEs A, have 
current densities J such that 


1 
D-J= Q A, for some /-tuple Q= (Q),..., Q)), (33.25) 
v=1 


where {Q,,} are smooth functions of x, u, and all derivatives of u. 


Definition 33.3.7 Equation (33.25) is called the characteristic form of the 
conservation law for the current density J, and the /-tuple Q, the character- 
istic of the conservation law. 


We are now in a position to prove the celebrated Noether’s theorem. 
However, we first need a lemma. 


Lemma 33.3.8 Let v= )~?_, X'0/dx! + >4_, U%d/du% where X' and 
U® are functions of x and u. Let 


7) 
Q(x, u"?) =U" (x,u)— beeue: ujuy, a@=l,...,¢q. 
i=1 


Then 

P . 
pry = pr vo + > x! D;, (33.26) 

i=1 

where 

q i 0 4 0 
vo= >. 0%(x,u! ae piv = DD Di Daa. 
w=1 a=l J 


The sum over J extends over all multi-indices with 0 < |J| <n, with the 
| J| =0 term being simply Vo. 
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Proof Substitute Q® in the definition of US as given in Theorem 32.3.5 to 
obtain 


UF =Dz;Q% +) X'u5,, 
i=1 


where Uf = O% + )O?_, X'u® = U. It follows that (with J = 0 included 
in the sum) 


=D; by Proposition 32.3.4 


and the lemma is proved. 


The celebrated Noether’s Theorem 33.3.9 (Noether’s theorem) Let 
theorem connecting 
symmetries to 
conservation laws 


P q 
v=) ) X!a/ax' +) U%a/au" 


i=l a=1 


be the infinitesimal generator - a local 1-parameter group of symmetries 
G of the variational problem L[u] = [ L(x, u™) d? x. Let 


P 
‘ ou” 
QO" (x, u)) =" (a,i)— yx (x,uwuy, uf = at 
i=l 
Then there exists a p-tuple J = (J\,..., Jp) such that 
q 
D-J=)) O°E,(L) (33.27) 
a=l1 


is a conservation law in characteristic form for the Euler-Lagrange equation 
Za(L) =0 


Proof We use Lemma 33.3.8 in the infinitesimal criterion of the variational 
symmetry (33.20) to obtain 


0=pr”v(L)+LD-X 


P P 
=prvo(L) + > DT ALLE se D;X' 
i=l i=l 
P . 
= pr vo(L) + ¥° Dj (LX') = pr vo(L) +D- (LX). (33.28) 
i=l 
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Using the definition of pr vo and the identity 
(Dj S)T = Dj(ST) — SD;T, 


we can commute Dj = Dj, --- Dj, past Q® one factor at a time, each time 
introducing a divergence. Therefore, 


OL OL 
prvo(L) =) Dj O* 7 =) 1 O"(-D)) 7 + DA 
Ay a,J us 


a 


q 
=>) OE. (L)+D-A, 


a=1 


where A = (A1,..., Ap) is some p-tuple of functions depending on L, the 
Q°’s, and their derivatives, whose precise form is not needed here. Combin- 
ing this with Eq. (33.28), we obtain 


q 
0=)/ O*E.(L) + D- (A+ LX). 


Selecting J = —(A + LX) proves the theorem. 


33.4 Application to Classical Field Theory 


It is clear from the proof of Noether’s theorem that if we are interested in 
the conserved current, we need to find A. In general, the expression for A 
is very complicated. However, if the variational problem is of first order 
(which in most cases of physical interest it is), then we can easily find the 
explicit form of A, and, consequently the conserved current J. We leave it 
for the reader to prove the following: 


Corollary 33.4.1 Let v = 7?_, X'0/ax' + °4_, U%9/du% be the in- 
finitesimal generator of a local \-parameter group of symmetries G of the 
first-order variational problem L[u] = i L(x, u") d?x. Then? 


form the components of a conserved current for the Euler-Lagrange equa- 
tion Ey(L) = 0. 


Historical Notes 

Amalie Emmy Noether (1882-1935), generally considered the greatest of all female 
mathematicians up to her time, was the eldest child of Max Noether, research mathe- 
matician and professor at the University of Erlangen, and Ida Amalia Kaufmann. Two of 


>We have multiplied J; by a negative sign to conform to physicists’ convention. 
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Emmy’s three brothers were also scientists. Alfred, her junior by a year, earned a doctor- 
ate in chemistry at Erlangen. Fritz, two and a half years younger, became a distinguished 
physicist; and his son, Gottfried, became a mathematician. 

At first Emmy Noether had planned to be a teacher of English and French. From 1900 
to 1902 she studied mathematics and foreign languages at Erlangen. Then in 1903 she 
started her specialization in mathematics at the University of Gottingen. At both universi- 
ties she was a nonmatriculated auditor at lectures, since at the turn of the century women 
could not be admitted as regular students. In 1904 she was permitted to matriculate at 
the University of Erlangen, which granted her the Ph.D., summa cum laude, in 1907. Her 
sponsor, the algebraist Gordan, strongly influenced her doctoral dissertation on algebraic 
invariants. Her divergence from Gordan’s viewpoint and her progress in the direction of 
the “new” algebra first began when she was exposed to the ideas of Ernst Fischer, who 
came to Erlangen in 1911. 

In 1915 Hilbert invited Emmy Noether to Gottingen. There she lectured at courses that 
were given under his name and applied her profound invariant-theoretic knowledge to 
the resolution of problems that he and Felix Klein were considering. Inspired by Hilbert 
and Klein’s investigation into Einstein’s general theory of relativity, Noether wrote her 
remarkable 1918 paper in which both the concept of variational symmetry and its con- 
nection with conservation laws were set down in complete generality. 

Hilbert repeatedly tried to obtain her an appointment as Privatdozent, but the strong prej- 
udice against women prevented her habilitation until 1919. In 1922 she was named a 
nichtbeamteter ausserordentlicher Professor (“unofficial associate professor’), a purely 
honorary position. Subsequently, a modest salary was provided through a Lehrauftrag 
(“teaching appointment”) in algebra. Thus she taught at Gottingen (1922-1933), inter- 
rupted only by visiting professorships at Moscow (1928-1929) and at Frankfurt (summer 
of 1930). 

In April 1933 she and other Jewish professors at G6ttingen were summarily dismissed. 
In 1934 Nazi political pressures caused her brother Fritz to resign from his position at 
Breslau and to take up duties at the research institute in Tomsk, Siberia. Through the 
efforts of Hermann Weyl, Emmy Noether was offered a visiting professorship at Bryn 
Mawr College; she departed for the United States in October 1933. Thereafter she lectured 
and did research at Bryn Mawr and at the Institute for Advanced Studies, Princeton, but 
those activities were cut short by her sudden death from complications following surgery. 
Emmy Noether’s most important contributions to mathematics were in the area of abstract 
algebra. One of the traditional postulates of algebra, namely the commutative law of mul- 
tiplication, was relinquished in the earliest example of a generalized algebraic structure, 
e.g., in Hamilton’s quaternion algebra and also in many of the 1844 Grassmann algebras. 
From 1927 to 1929 Emmy Noether contributed notably to the theory of representations, 
the object of which is to provide realizations of noncommutative algebras by means of 
matrices, or linear transformations. From 1932 to 1934 she was able to probe profoundly 
into the structure of noncommutative algebras by means of her concept of the verschrdnk- 
tes (“cross”) product. 

Emmy Noether wrote some forty-five research papers and was an inspiration to many 
future mathematicians. The so-called Noether school included such algebraists as Hasse 
and W. Schmeidler, with whom she exchanged ideas and whom she converted to her 
own special point of view. She was particularly influential in the work of B. L. van der 
Waerden, who continued to promote her ideas after her death and to indicate the many 
concepts for which he was indebted to her. 


Corollary 33.4.1 can be applied to most DEs in physics derivable from a 
Lagrangian. We are interested in partial DEs studied in classical field the- 
ories. The case of ODEs, studied in point mechanics, is relegated to Prob- 
lem (33.11). 

First consider spacetime translation v! = n// 0;, where we have intro- 
duced the Lorentz metric 7‘/ to include non-Euclidean cases. In order for 
v' to be an infinitesimal variational symmetry, it has to satisfy Eq. (33.20), 
which in the case at hand, reduces to vw! (L) = 0, or 0;L = 0. 


33.4 Application to Classical Field Theory 


Box 33.4.2 In order for a variational problem to be invariant under 
spacetime translations, its Lagrangian must not depend explicitly on 
the coordinates. 


If spacetime translation happens to be a symmetry, then X! > n/, 
and the (double-indexed) conserved current, derived from Corollary 33.4.1, 
takes the form 


a= 
Using Greek indices to describe space-time coordinates, and Latin indices energy momentum 
to label the components of R4, we write current density 


q j q 
dg! OL dg/ OL 
poy oP OE _ wy = a a a2 — =n, (33.29) 
j=! OX ag) ax agi 


where we changed the dependent variable u to ¢ to adhere to the notation 
used in the physics literature. Recall that 6) = 0g¢/ /dx”. T“” is called the 
energy momentum current density. 

The quantity T“”, having a vanishing divergence, is really a density, 
just as the continuity equation (vanishing of the divergence) for the electric 
charge involves the electric charge and current densities. In the electric case, 
we find the charge by integrating the charge density, the zeroth component 
of the electric 4-current density. Similarly, we find the “charge” associated 
with T“” by integrating its zeroth component. This yields the energy mo- 


mentum 4 vector: 
Pee | T dex 
14 


dP’ d aT 
=— | Fax a ——d?x 
4 V 


We note that 


dt dt 


where we have used the three-dimensional divergence theorem. By taking S 
to be infinite, and assuming that T'” — 0 at infinity (faster than the element 
of area da; diverges), we obtain dP”/dt = 0, the conservation of the 4- 
momentum. 


Example 33.4.3 A relativistic scalar field of mass m is a 1-component field 


satisfying the Klein—Gordan equation, which is, as the reader may check, the 
Euler-Lagrange equation of 


Li¢]= ‘i L(o,$,)d¢x = / sl" Bub ivd — m9?) ds 
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The energy momentum current for the scalar field is found to be 


ae 


ax’ 


= alga’p— nL, apa” 


Note that T“” is symmetric under interchange of its indices. This is a de- 
sired feature of the energy momentum current that holds for the scalar field 
but is not satisfied in general, as Eq. (33.29) indicates. The reader is urged 
to show directly that 0,7" =0 = 0,7"”, i.e., that energy momentum is 
conserved. 


To go beyond translation, we consider classical (nonquantized) fields® 
{p/ ja ,» which, as is the case in most physical situations, transform among 
themselves as the rows of the ath irreducible representation of a Lie group 
G that acts on the independent variables. Under these circumstances, the 
generators of the symmetry are given by Eq. (30.11): 


Di) =F, EG" (0) 5p + 8X" (x: a (33.30) 


where v labels the independent variables. Corollary 33.4.1 now gives the 
conserved current as 


aT ® 


i= {Xx gE) gk Oa 


XM (x; & ie }s, — o*(x) 
where summation over repeated indices is understood with | < k < ng and 
1 <v < p. We can rewrite this equation in the form 


y= | x obooa — X¥ (x; pli oo gh ~=(&), (33.31) 


mn 


gk 


where J“ and TOE ) are Ng X Ng matrices whose elements are Ji and 


Ti), respectively, and 1 is the unit matrix of the same dimension. 

We note that the conserved current has a coordinate part (the term that 
includes X” and multiplies the unit matrix), and an “intrinsic” part (the term 
with no X") represented by the term involving + (é). If the field has only 
one component (a scalar field), then xo) (€) = 0, and only the coordinate 
part contributes to the current. 

The current J” acquires an extra index when a component of & is chosen. 
As a concrete example, consider the case where G is the rotation group in 
R?. Then a typical component of & will be €°° , corresponding to a rotation 
in the po-plane, and the current will be written as J?" . These extra indices 
are also reflected in X, as that too is a function of &: 


Xe (x; EP?) et 


age TAP Be — X78 => XM (x: £7) = xP BME — 75M, 


The reader notes that the superscript a, which labeled components of the independent 
variable u, is now the label of the irreducible representation. The components of the 
dependent variable (now denoted by @) are labeled by /. 


33.5 Problems 


The volume integral of J°'?° will give the components of angular momen- 
tum. When integrated, the term multiplying 1 becomes the orbital angular 
momentum, and the remaining term gives the intrinsic spin. The conser- 
vation of J“’°° is the statement of the conservation of total angular mo- 
mentum. The label a denotes various representations of the rotation group. 
If p = 3, then @ is simply the value of the spin. For example, the spin-5 
representation corresponds to a = 7 and 


1 1 
ZO/2)(g) = 5(9',0°,0°), or F0/%) (E47) = 50°, a= 1,2,3, 


with a labeling the three different “directions” of rotation.’ If the field is a 
scalar, © (&) = 0, and the field has only an orbital angular momentum. 


33.5 Problems 


33.1 Show that the derivative of a linear map from one Hilbert space to 
another is the map itself. 


33.2 Show that a complex function f : C D> §2 — C considered as a map 
f :R? > 2 = R?’ is differentiable iff it satisfies the Cauchy-Riemann con- 
ditions. Hint: Consider the Jacobian matrix of f, and note that a linear com- 
plex map T: C > C is necessarily of the form T(z) = Az for some constant 
AEC. 


33.3 Show that 
Ey iu] 
ou 


(x) = —0;5(x — y). 


33.4 Show that the first functional derivative of L[u] = Se V1+u2 dx, ob- 
tained using Eq. (33.9), is E(L). 


33.5 Show that for the proper time of special relativity 


éL[x ] = Xss 
éx(s) (1 — x2)3/2" 


Use this to show that the contribution of the second variational derivative to 
the Taylor expansion of the functional is always negative. 


33.6 Show that the first prolongation of the Lorentz generator v = uo, + 
X Oy 18 


prYv=v+ (1 —uz) a 
x 


7Only in three dimensions can one label rotations with a single index. This is because each 
coordinate plane has a unique direction (by the use of the right-hand rule) perpendicular 
to it that can be identified as the direction of rotation. 
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33.7 Verify that rotation in the xu-plane is a symmetry of the arc-length 
variational problem (see Example 33.1.13). 


33.8 Show that v4, v6, and v7 of Table 32.3 are variational symmetries of 
Eq. (33.21), but v5, vg, Vo, and vio are not. Find the constant c (if it exists) 
such that v5 + cud, is a variational symmetry. Show that no linear combi- 
nation of inversions produces a symmetry. 


33.9 The two-dimensional Kepler problem (for a unit point mass) starts 
with the functional 


u= [| 56? +92) -veo]ar, r= /x2+y?. 


(a) Show that L is invariant under f¢ translation and rotation in the xy- 
plane. 

(b) Find the generators of f translation and rotation in polar coordinates 
and conclude that r is the best choice for the independent variable. 

(c) Rewrite L in polar coordinates and show that it is independent of tf 
and 6. 

(d) Write the Euler-Lagrange equations and integrate them to get 6 as an 
integral over r. 


33.10 Prove Corollary 33.4.1. 


33.11 Consider a system of N particles whose total kinetic energy K and 
potential energy U are given by 


eo U(t,x) = > kap|x% — x°|7, 


a#B 


1 N 
K®=5 Yo ma |X 
a=1 


where x% = (x%, y%, z®) is the position of the ath particle. The variational 
problem is of the form 


L{x] = Li.xidr= fo [K@ —UC,x)] dr. 


—0o = 


(a) Show that the Euler-Lagrange equations are identical to Newton’s sec- 
ond law of motion. 
(b) Write the infinitesimal criterion for the vector field 


eee eee ee 
vat.5, + D6 Gag to tn age 0S rd 


to be the generator of a 1-parameter group of variational symmetries 
of L. 
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(c) Show that the conserved “current” derived from Corollary 33.4.1 is 


N 
T= Yo ma (E%x" ale ney ale chee) —tE, 
a=1 
where EF = K + U is the total energy of the system. 
(d) Find the conditions on U such that (i) time translation, (ii) space trans- 
lations, and (iii) rotations become symmetries of L. In each case, com- 
pute the corresponding conserved quantity. 


33.12 Show that the Euler-Lagrange equation of 


L[¢] = / L(g, by) d*x = i sl! bay —m2g?] dx 


is the Klein—Gordan equation. Verify that T“” = a“g0"¢ — n“”L are the 
currents associated with the invariance under translations. Show directly that 
T"” is conserved. 


Part X 
Fiber Bundles 


Fiber Bundles and Connections 34 


The elegance of the geometrical expression of physical ideas has attracted 
much attention ever since Einstein proposed his geometrical theory of grav- 
ity in 1916. Such an expression was, however, confined to the general theory 
of relativity until the 1970s when the language of geometry was found to be 
most suitable, not only for gravity, but also for the other three fundamen- 
tal forces of nature. Geometry, in the form of gauge field theories of elec- 
troweak and strong interactions, has been successful not only in creating a 
model—the so-called standard model—that explains all experimental re- 
sults to remarkable accuracy, but also in providing a common language for 
describing all fundamental forces of nature, and with that a hope for unifying 
these forces into a single all-embracing force. This hope is encouraged by 
the successful unification of electromagnetism with the weak nuclear force 
through the medium of geometry and gauge field theory. 

The word “geometry” is normally used in the mathematics literature for 
a manifold on which a “machine” is defined with the property that it gives 
a number when two vectors are fed into it. Symplectic geometry’s machine 
was a nondegenerate 2-form. Riemannian (or pseudo-Riemannian or semi- 
Riemannian) geometry has a symmetric bilinear form (metric, inner prod- 
uct). Both of these geometries are important: Symplectic geometry is the 
natural setting for Hamiltonian dynamics, and (pseudo- or semi-) Rieman- 
nian geometry is the basis of the general theory of relativity. 

The most elegant way of studying geometry, which very naturally en- 
compasses the (pseudo-)Riemannian geometry of general relativity and the 
gauge theory of the fundamental interactions of physics, is the language of 
the fiber bundles, which we set out to do in this chapter. 


34.1 Principal Fiber Bundles 


In Sect. 28.4, we defined the tangent bundle T(M) as the union of the tan- 
gent spaces at all points of a manifold M. It can be shown that T(M) is a 
manifold, and that there is a differentiable surjective map z : T(M) > M, 
sending the tangent space T,(M) at x € M to x.! The inverse of this map at 


'For ease of notation, we have changed Jp(M) of Definition 28.4.1 to 7, (M). 
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34 Fiber Bundles and Connections 


x, ~!(x), is the collection of all vectors at x. The notion of tangent bundle 
and the corresponding map z can be generalized to the extremely fruitful 
notion of principal fiber bundle. 


Definition 34.1.1 A principal fiber bundle (PFB) over a manifold M with 
Lie group G is a manifold P, called the total space or the bundle space, 
and an action of G on P satisfying the following conditions: 


(1) G acts freely on P on the right: Re(p) =p-g=pge P. 

(2) M is the space P/G of the orbits of G in P and the canonical map 
az: P— P/G is differentiable. 

(3) P is locally trivial, i.e., for every point x € M, there is a neighbor- 
hood U containing x and a diffeomorphism 7,, : x !(U) >UxG 
of the form 7,,(p) = (7 (:p), Su(p)) where s, :a2—!(U) > G has the 
property s,,(pg) = sy(p)g for all g € G and p € x~!(U). The map T, 
is called a local trivialization (LT). 


A principal fiber bundle will be denoted by P(M,G, 7), or P(M,G), 
or even just P. M is called the base space, G the structure group, and z 
the projection of the PFB. For each x € M, 1~!(x) is a submanifold of P, 
called the fiber over x. If x = m(p), then a—!(x) is just the orbit of G at p. 
By Theorem 29.1.7, every fiber is diffeomorphic to G. There is no natural 
group structure on 2~!(x). So, although fibers can be thought of as copies 
of G, they are so only as manifolds. 


Remark 34.1.1 Just as a fiber sprouts from a single point of the earth 
(a spherical 2-manifold), so does a fiber a(x) sprout out of a single 
point x of the manifold M. And just as you can collect a bunch of fibers 
and make a bundle out of them, so can you collect a bunch of x !(x)’s 
and make P = |), m~!(x),. Furthermore, fibers sprout vertically from the 
ground. Similarly, in a sense to be elaborated in our discussion of connec- 
tions, ~!(x) are “vertical” manifolds, while M is “horizontal.” 


Example 34.1.2 Let M be any manifold and G any Lie group. Let P = 
M x G and let G act on P on the right by the rule: (x, g)g’ = (x, gg’). We 
note that the action is free because 


G22 =G.2) — Gee)=—G.2) — ee =e 
> gee 


Two points (x, g) and (x’, g’) belong to the same orbit iff there is h € G such 
that (x, g)h = (x’, g’). This happens iff x’ =x and gh = g’. It follows that 
for any g, (x, g) belongs to the orbit at (x, e). Therefore, [(x, g)] =[@, e)]. 
This gives a natural identification of P/G with M. For trivialization, let the 
neighborhood U of any point be M and let s,,(x, g) = g. This choice makes 
P globally trivial, thus the name trivial for such a bundle. 


34.1 Principal Fiber Bundles 1081 


Definition 34.1.3 A homomorphism of a principal fiber bundle P’(M’, G’) homomorphism, 

into another P(M,G) is a pair (f, fg) of maps f : P’ > P and fg: isomorphism, and 

G' > G with fg a group homomorphism such that f(p'g’) = f(p’) fg(g’) automorphism of PFBs 
for all p’ € P’ and g’ € G’. Every bundle homomorphism induces a map 

fu: M’' — M.If f is bijective and fg a group isomorphism, then fy isa 

diffeomorphism and (f, fg) is called an isomorphism of P’(M’, G’) onto 

P(M,G). An isomorphism of P(M, G) onto itself in which fg = idg is 

called an automorphism of P. 


Requirement (3) in Definition 34.1.1 situates x € M in the (sub)bundle 
x (U ) which, through the diffeomorphism 7,,, can be identified as the 
trivial bundle U x G. The natural right action of G on the trivial bundle 
U x G should therefore be identified with its action on 1~! (U). On the one 
hand, 


T,(p) = (x(p),8u(P)), Tu (pg) = (1 (pg), Su(pg)) = (2-(p), Su(Pg)), 


where the last equality follows because p and pg both belong to the orbit 
at p. On the other hand, 


(x(p), Su(p))g — (x(p), Sy (p)g) for the trivial bundle U x G. 


So if the action of G on U x G is to be identified with its action on x~!(U), 
we must have s, (pg) = 5, (p)g. That is why this equality was demanded in 
Definition 34.1.1. We summarize this by saying that 7, respects the action 
of G. 

Now let T, :7~!(U) > U x Gand Ty: 2~!(V) > V x G be two LTs. 
If xe UNV and x(p) =x, then T,(p) = (2(p), Su(p)), and Ty(p) = 
(1 (Pp), Sy(p)). Since sy(p), Sv(p) € G, there must exist g € G such that 
8Sy(p) = S,(p). In fact, g = 5, (p) (Sy (oi. What is interesting about g is 
that it can be defined on M. 


A local trivialization 
respects the action of 
structure group. 


transition functions for a 
PFB 


Definition 34.1.4 Let 7, :2~'(U) > U x G and T,: 27!(V) > 
V x G be two LT's of a PFB P(M,G,z:). The transition function 
from T, to Ty is the map gyy:UMV — G, given by gyy(x) = 
SNC) 


For this definition to make sense, g,,)(x) must be independent of p € 
at !(x). Indeed, we have 


Proposition 34.1.5 The transition function gyy from the local trivialization 
T, to the local trivialization T, is independent of the choice of p € ~'(x). 
Furthermore, 


CQ) guu(x)=e VxeU; 


(2) Suv(x) = galaxy * Vx EUNV,; 
(3) Suv(®) = Suw(®)gwv®) We eUNV OW. 
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Proof Let p' € x~!(x) be a different point from p. Since p’ is in the same 
orbit as p, we must have p’ = pg for some g € G. Then 


Sy (p’) (Sj (p’)) | = Su(pg) (su (pg)) | = Su (p)8(su (pg) 


= su(p)gg'(su(p)) | =5u(p)(su(p)) 


Thus g,,, is well defined. The other parts of the proposition are trivial. 


Consider a manifold M. Let {U,} be an open cover of M, i.e., open sets 
such that M = L),, Ug. Let G be a Lie group. Construct the set of trivial 
PFBs Py = Ug x G. Connect all pairs Py and Pg by transition functions 
8ap : Ug 1 Ug => G satisfying (1)-(3) of Proposition 34.1.5. This process 
constructs a PFB with transition functions gyg. Therefore, a PFB is defined 
by its transition functions. In fact, it is only the transition functions that 
determine the bundle. Any PFB can be broken down into a collection {Uy x 
G} of trivial bundles. It is how these trivial bundles are “glued together” via 
the transition functions that distinguishes between different PFBs. 

Given a PFB P(M, G) anda subgroup G’ of G, it may be possible to find 
some covering {U,} of M and transition functions ggg which take values in 
G’. The new covering and transition functions define a new PFB P’(M, G’). 
Then we say that the PFB P(M, G) is reducible to P’(M, G’). We also say 
that the structure group G of P(M,G) is reducible to G’ if P(M,G) is 
reducible to P’(M, G’). 


Example 34.1.6 Let’s reconsider the trivial bundle M x G. What is the 
most general Jocal trivialization of this bundle? Let x € Uy if p = (x, g), 
then Ty(p) = (x, g’) for some g’ € G, and se(p) = Sa(x, g) = g’. This 
means that sy affects only g, and therefore, can be reduced to a function fg : 
G — G having the property that fo(gg’) = fu(g)g’. With g =e, this gives 
falg’) = fule)g’. Thus, fy is simply left multiplication by hy = fx(e), 
where hy may depend on Uy. Hence, the most general LT for the trivial 
bundle M x G is 


Ta (p) = (x, hag) = (x, Sa(p)) or se((x, g)) =hag 


So, the transition functions are of the form gyg(x) = hah! , and can easily 
be shown to satisfy the three conditions of Proposition 34.1.5. 

Can the trivial bundle be reduced? Are there a covering {U,} and tran- 
sition functions gyg which take values in a subgroup of G? In fact, G can 
be drastically reduced! In the above discussion let hy =h for all a. Then, 
8ap(x) =e for all x € Uy 1 Ug (and therefore for all x € M). 


The converse of the last statement of the example above is also true: 


Proposition 34.1.7 Any PFB whose structure group can be reduced to the 
identity of the group is isomorphic to the trivial PFB. 
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Definition 34.1.8 A local section (or local cross section) of a prin- 
cipal fiber bundle P(M,G,z) on an open set U C M is a map 
o, : U — P such that m7 o o, = idy. If U = M, then o, =a 1s called 
a global section or simply a section on M. 


Proposition 34.1.9 There is a natural 1-1 correspondence between the 
set of local trivializations and the set of local sections. In particular, if 
P(M,G,z) has a (global) section, then P(M,G,1) = M x G, the triv- 
ial bundle. 


Proof For each local trivialization 7, let o, = (ie luxe} U=U x {e} > 
P. Conversely, for each o,,, define S, :U x G—> a (U) by Sy(x, g) = 
ou(x)g. Then it can be shown that S, is a bijection and T,, = Sj ! is a local 
trivialization. 


Let o, be a local section on U and o, on V. If x eU NV, then o,(x) 
and o,(x) both belong to a—'(x). Hence, there must be a g €G such 
that oy(x) = o,(x)g. We want to find this g. From the definition of Ty, 
we have ieee e€) = po for some po € P. Thus, 7,,(po) = (*, Su(po)) = 
(x, e) implies that s,(po) = e. But 1 (x, €) = 0, (x). Therefore, we have 
Oy(x) = po with sy(po) = e. Similarly, op(x) = pi with sy(p1) =e. Let 
P1 = pog. Then e = 5,(p1) = Sy(pog) = Sv(Po)g, OF & = Sy(po) |. We thus 
get dy(x) = posy(po)~! or oy(x)Sy(po) = po. Multiplying both sides by an 
arbitrary g, we get 


Oy(X)5Sy(Po)8 = Pog OF y(X)Sy(Pog) = Pog or 
Oy(X)Sy(p) = pVp EP. 


An identical reasoning gives o,,(x)s,(p) = p. Therefore, oy(x)sy(p) = 
Oy (X)Sy(P), OF Cy(X) = Oy (X)5yu(P) Sv (p)7! . Thus, 


Oy (X) = Ou (*) Suv) (34.1) 


where g,,, is the transition function from T,, to Ty. 


Example 34.1.10 (The bundle G(G/H, H)) Let G be a Lie group and H 
one of its Lie subgroups. Let H act on G on the right by right multiplication. 
Let G/H be the factor group of this action and z : G > G/H, the natural 
projection. It is shown in Lie group theory that such a construction has local 
trivializations. Then with G as the total space, M = G/H as the base space, 
and z: G— G/H as the projection, G(G/H, H,) becomes a principal 
fiber bundle. 


Example 34.1.11 (Bundle of linear frames) Let M be an n-manifold. A lin- 
ear frame p at x € M is an ordered basis (X;, X2,..., X;,) of the tangent 
space T,(M). Let L,(M) be the set of all linear frames at x and L(M) the 
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set of all L,.(M) for all x ¢ M. Let a: L(M) — M be the map that sends 
L,(M) to x. If A € GL(n, R) is a matrix with components a’ , then the action 
of GL(n, R) on L(M) on the right is written in matrix form as 


1 1 1 
ay ay ay 
2 2 2 
XX) ...x,) {7 ® on Cm 
( 1 42 n) . : = ( 142... n)s 
ay as a” 


In “component” form, this can be written as Y; = a; X; (with summation 
convention in place). Since A is invertible, (Y;, Yo,..., Y,) € L(M). So, 
indeed GL(n, R) acts on L(M) on the right. It is easy to show that the ac- 
tion is free (Problem 34.4). Furthermore, if p,q € x(x) = L,(M), ie., 
if p and q are two (ordered) bases of 7}.(M), then there must exist an in- 
vertible matrix A such that g = pA. Therefore, 2~!(x) is indeed the orbit of 
GL(n, R) at p. This shows (1) and (2) of Definition 34.1.1. Foregoing the 
rather technical details of (3), we find that L(M)(M, GL(n, R)) is indeed a 
principal fiber bundle. 


Definition 34.1.12 The PFB described in Example 34.1.11 is called 
the bundle of linear frames and denoted by L(M)(M, GL(n, R)), or 
simply L(M). 


34.1.1 Associated Bundles 


Let P(M,G) be a PFB and F a manifold on which G acts on the left: 
G x F > (g,é) + gé € F. On the product manifold P x F let G act on 
the right by the rule Ry(p, €) = (p, €)g = (pg, g~!&). Denote the quotient 
space of this action by P xg F, and let E = P xq F. For [p, &] € E let 
we([p,é]) =2(p) =x € M. Then zg isa projection of E onto M. Define 
Pu: Tee (UY + U x F by bu(lp. é]) = (1 (Pp), 5u(p)§), where s, : P > G 
is as defined in the local trivialization of P(M, G). One can show that ¢, is 
a diffeomorphism (Problem 34.5) and that EF is a fiber bundle. 


Definition 34.1.13 The fiber bundle constructed above is called the 
fiber bundle over the base M with standard fiber F and struc- 
ture group G which is associated with the principal fiber bundle 
P(M,G). E is more elaborately denoted by E(M, F,G, P). The 
fiber over x in E, Me (x) is denoted by F,. 


The diffeomorphism ¢,, when restricted to the fiber over x, is denoted 
by ¢,. It is a diffeomorphism: ¢, : eS —> {x} x FSF by ¢x([p, €]) = 
Sy (p)&. Note that this map is determined entirely by p. The inverse of this 
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mapping, also determined entirely by p, can be thought of as a map p: 
F > F,, given by p(&) = pé = [p, é]. It can easily be shown that this map 
satisfies 


(pgjé = p(gé) forpeP, geG, FeF. (34.2) 


Theorem 34.1.14 Let P(M, G) be a principal fiber bundle with the 
associated bundle E(M, F,G, P). Then each p € P. can be consid- 
ered as a diffeomorphic map p: F — Fy, satisfying (34.2). 


Consider two fibers Fy and Fy. They are diffeomorphic, because each 
is diffeomorphic to F. In fact if p: E > F, and q: E — Fy, then go 
pKa Fy, is called an isomorphism of F, and F. For x = y, go ~ 
becomes an automorphism of F,. Moreover, since 7(q) = x = (p), we 
must have q = pg for some g € G. Therefore, any automorphism of F, is 


of the form po go p7!. 


Proposition 34.1.15 The group of automorphisms of Fx is isomorphic with 
the structure group G. 


Example 34.1.16 The bundle of linear frames consists of fibers which in- 
clude all ordered bases of 7\.(/) and a right action by GL(n, R), which 
is the group of invertible linear transformation of IR”. For every ordered 
basis p = (X1, X2,..., Xn) € L(M), let p(é;) = Xj, where {@;}?_, is the 
standard basis of R”. This then defines a map p: R” > T,(M). All 
this, in conjunction with Theorem 34.1.14, leads to the conclusion that 
E(M,R”, GL(n, R), L(M)) is the bundle associated with the bundle of lin- 
ear frames with standard fiber R”, and that ie (x) = T,(M). But Defini- 
tion 28.4.1, and the discussion at the very beginning of this chapter, indicate 
that 7,,(M/) is the fiber over x of the tangent bundle T(M). We also note 
that the right action p> pA of GL(n, R) on L(M) can be interpreted as the 
composite map po A: 


Rr", Rr". TM). 
All this discussion is summarized in 


Box 34.1.17 The tangent bundle T(M) is associated with the bun- 
dle of linear frames L(M) with standard fiber R". The right action 


A 
p+ pA of GL(n, R) on L(M) can be interpreted as R"” —> R"” , 
T;(M), the composite map po A. 
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With J*(R”) as the standard fiber, we obtain the tensor bundle 7 (M) of 
type (r,s) over M which is associated with L(M). A tensor field of type 
(r, 5) is a section of this bundle: T) : M — T/(M). 


34.2 Connections ina PFB 


A principal fiber bundle can be thought of (locally, at least) as a continuous 
collection of fibers, each fiber located at x €e U C M. The points of each 
fiber are naturally connected through the action of G. In fact, given a point of 
the fiber, we can construct the entire fiber by applying all g € G to that point. 
This is by construction and the fact that G acts freely on each fiber. Because 
each fiber is an orbit of G, and because G acts freely on the fiber, each 
fiber is diffeomorphic to G. However, there is no natural diffeomorphism 
connecting one fiber to its neighbor. Such a connection requires an extra 
structure on the principal fiber bundle, not surprisingly called connection. 
Given a principal fiber bundle P(M, G), the action of G on P induces 
a vector field on P for each A € g (see Definition 29.1.30). In fiber bundle 
theory, it is common to denote this vector field by A* and call it the funda- 
mental vector field corresponding to A. With y,(t) = exp(Ar) the integral 
curve of A, the fundamental vector field at any point p € P is defined as 


.d d 
Ai, = a Pya) v = rr exp(Ar)) a (34.3) 


Note that y4(0) = e, i.e., the curve passes through the identity of the Lie 
group G. This is required because only J-(G) is identified as the Lie al- 
gebra of G. Thus, any G-curve that passes through the identity induces a 
fundamental vector field in P. Since the action on P is right multiplication, 
Proposition 29.1.34 gives (Ad, A)* = ee = R,-1,,A* or equivalently, 
Rg, A* = (Ad,-1A)* = (Ad,'A)*. 

The diffeomorphism of each fiber a~!(x) with G leads to the isomor- 
phism of the Lie algebra g of G with the tangent space T), (~!(x)) at each 
point p of the fiber; and since the action of G is confined to a fiber, the fun- 
damental field A* must also be confined to the tangent spaces of the fibers. 
To “connect” one fiber to its neighbor, we use the fundamental vector fields 
defined on them. This is a natural thing to do since each A* originates from 
the same A € g. 

Is there anyway that we can make an association of A* with its origin A 
in g? Is there a “machine” that spits out A when A” is fed into it? The most 
obvious answer is a g-valued one form! So define a g-valued 1-form w by 
w(A*) = A. How would w change on T, (x—!(x))? The right action of G 
on T, (—!(x)) induces a right transformation Row. What would this give 
when acting on A*? 


Riw(A*) = @(RexA*) =@((Ad,'A)") = Adz'A = Ad; 'o(A*). 
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Therefore, if a 1-form is to associate A* with A, it must satisfy Ro = 


Ad;'a. When extended to the entire P (not just 2~!(x)), @ defines a con- 
nection on P. 


Definition 34.2.1 A connection I" is a g-valued 1-form @ on P such that 
for any vector field X € T(P), w(X) is the unique A € g related to A* that 
passes through that point. We demand that @ satisfy the following condi- 
tions: 


(a) @(A*)=A. 
(b) R@ =Ad,'o on P, i.e., for any vector field X € T(P), 


@(RexX) = Adz 'w(X). 
We call w a connection 1-form. 


Any vector field Y in T(P) can be written as Y = Y;, + A* with A* in 
the tangent space of a fiber. Then, since w(Y) = w(A*), we get w(Y),) = 0. 
A vector field X in P satisfying w(X) = 0 is called a horizontal vector field. 
A vector Z in a tangent space of a fiber has the property that zr,.(Z) = 0, be- 
cause zr is the constant map on any fiber: z : a (x) — x (see Box 28.3.3). 
Any vector field Y in P satisfying z,(Z) = 0 is called a vertical vector 
field.” If we define the horizontal and vertical subspaces as 


H={XeT(P)|@(X)=0}, V ={ZeT(P)|x.(Z)} =0, 


then T(P) = H @ V. Furthermore, because of (b) in the definition above, 
Hyg = RgxHy. Thus, at every point of P, T(P) can be written as the 
direct sum of a horizontal and a vertical subspace and these subspaces 
are smoothly connected to each other. From local trivialization, we con- 
clude that dim P = dim M + dimG, and since dim V = dimG, we obtain 
dim H = dim M. Hence, 2, : Hp > T,(M) is a linear isomorphism. 


34.2.1 Local Expression for a Connection 


The local trivialization of a bundle with its corresponding sections could be 
used to define a g-valued 1-form on the base manifold M. The pull-back of w 
by a local section o,, is indeed a g-valued 1-form on U C M: For Ye T(U), 
we have (by definition) o,,w(Y) = @(o,Y). Since local sections depend on 
the subsets chosen, and since they are connected via transition functions as 
in Eq. (34.1), we have to know how w, = 0;@ is related to w, = 0. To 
take full advantage of formalism, let us write Eq. (34.1) as a composite map 


ay rU AV 2s Px Gs Pp, 


2See the remark after Definition 34.1.1. 
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where a(x) = (o,(x), Suv (x)) and @(o,(x), Suv (x)) a Ou(X)8uv(X). Then 
Ovx = Py, 0 dx, and using Proposition 28.3.7, we obtain 


Oyx(Y) = ©, (a,Y) = Do, (x)* (Suve (Y)) + DP o,.,(x)x (oux(¥)). (34.4) 


In the second term on the right, ®g,,,(x)x = Rg,,(x)«, the right multiplication. 
For the first term, we note that 


Do, (x)«(Suve(¥)) = oe, (x) guv(vy (t))) 

dt 1=0 
Now, we note that o,(x) is a point in P and gyy(yy(t)) is a curve in G. 
Therefore, 0,,(x)2uv(yvy (t)) is a curve in P. It is not the curve of a fun- 
damental vector field, because gyy(yvy(0)) = guv(x) 4 e. However, if we 
rewrite it as 


Ou (x) guv (x) 8uv() | 8uv(vy ()) = 0y(x) Suv(X) | Suv yr (t)) 
—— 
=Yqy (t) 
= 0y(*) Var (0), 


then y,y(0) = e and the vector field associated with y,y(t) is indeed 
a fundamental vector field. It is clear that AY = Lguy(x)lxSuve(¥) = 


Le isin Suvx(Y). We therefore, write Eq. (34.4) as 


Oye (Y) = Ata) + Reuo(x)e (us (¥)). (34.5) 
Applying @ on both sides, we get 
@(oy.(¥)) =, (¥Y) =@(A2%,) + (Re, (2) (ux(W))), 


or 


@,(Y)=L Suve(¥) + Ad_ | au (¥), (34.6) 


=1 
Suv (x) Suv (x 
where for the first term on the right-hand side, we used (a) and for the second 
term we used (b) of Definition 34.2.1. 

Let 0 be the left-invariant canonical 1-form of Definition 29.1.29. For 
each UM V define a g-valued 1-form by 0,,, = g7,,0. Then it can be shown 
that Eq. (34.6) can be written succinctly as 


Wy =O yy + Adz! ay. (34.7) 


We defined the connection I" on a principal fiber bundle as a g-valued 
1-form having properties (a) and (b) of Definition 34.2.1. Then we showed 
two important consequences of the definition: that the 1-form splits T(P) 
into horizontal and vertical subspaces at each point of the bundle; and that 
it defines a g-valued 1-form on each domain of the local trivializations and 
these 1-forms are connected by (34.6). It turns out that the two consequences 
are actually equivalent to the definition, i.e., that if T,,(P) = Hp ® Vp at each 
p € P such that Hp, = Ry. Hp, then there exists a 1-form satisfying (a) and 
(b) of Definition 34.2.1. Similarly, the existence of a g-valued 1-form w, 
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on the domain U of each trivialization T,, leads to the g-valued 1-form of 
Definition 34.2.1. 


Example 34.2.2 Suppose that G is a matrix group, i.e., a subgroup of 
GL(n, R). Then gy, is a matrix-valued function or 0-form. Hence, gyyx = 
dguy (see the discussion on page 874), and Le. (x)« 1S just left multiplica- 
tion by gil (x). Thus the first term on the right-hand side of Eq. (34.6) is 
a (x)dguy(Y). For the second term, we note that for matrices A and B, we 
have 
d d 1 1 
AdaB = —Ada(ye(t))} = —(Aye(t)A~')| =ABA™. 
dt =0 at t=0 


Therefore, Ad, (.@u(¥) = Suv(X)~!@y(Y) guy (x). Consequently, the 


transformation rule (34.6) can be expressed as 


@y = Bi W8uv + 87 @u8uvs (34.8) 


where it is understood that the vector field will be evaluated by dg, and w, 
on the right-hand side. 


34.2.2 Parallelism 


The diffeomorphism of 7~!(U) with U x G given by a local trivializa- 
tion gives rise to an isomorphism of T,(P) and T,(U) x Tg(G). A con- 
nection splits 7;,(P) into a horizontal subspace and a vertical subspace, of 
which the vertical subspace is isomorphic to T,(G). Therefore, the hor- 
izontal subspace must be isomorphic to 7).(U) = T,(M). In fact since 
1x(Vpy) = 0, 2, maps the horizontal subspace isomorphically to 7,(M), 


1: Hyp —> Tx(M). 


Definition 34.2.3 The horizontal lift (or simply the lift) of a vector field X 
on M is the unique vector field X* on P, which is horizontal and z,, (X},) = 
Xx p) for every p € P. 


From the lift of a vector field, we can move on to the lift of a curve 
in M. Given a curve y(t) = x; in M, we can lift each point of it into P and 
get a curve y*(t) = p, in P in such a way that the tangent vector to y* is 
horizontal and maps to the tangent vector to y at its corresponding point. 
More precisely, 


Definition 34.2.4 Let y(t) = x; be a curve in M. A horizontal lift (or 
simply the lift) of y is a horizontal curve y*(t) = p; in P such that 
m(y*(t)) = y(t). By a horizontal curve is meant one whose tangent vec- 
tors are horizontal. 
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By local triviality, a curve y(t) in U C M maps to a curve a(t) in P, 
which may not be horizontal. If there is a horizontal curve y*(t), each of 
its points can be obtained from a(t) by right multiplication by an element 
of G, because a(t) and y*(t) both belong to the same fiber for each given f. 
So, we have y*(t) = a(t)g(t). The question is if this construction actually 
works. The answer is yes, and we have the following proposition, whose 
proof can be found in [Koba 63, pp. 69-70]: 


Proposition 34.2.5 Let y(t),0<t <1, be a curve in M. For an arbitrary 
point po of P with m(po) = y (0), there exists a unique horizontal lift y* 
of y starting at po (i.e., with y* (0) = po). Furthermore, the unique lift that 
starts at p = pog is y*(t)g. 


Let y = x;,0<t <1 bea curve in M. Let po be an arbitrary point in 
P with (po) = y (0) = xo. The unique lift y* of y through po has the end 
point p; = y*(1) such that 2(p,) = y(1) = x1. By varying pg in the fiber 
a! (xo), we obtain a bijection for the two fibers x! (xo) and 2~! (x1). De- 
note this mapping by the same letter y and call it the parallel displacement 
along the curve y. 

The notion of parallelism can be extended to the associated bundles as 
well. For this, we need to split the tangent spaces of F into horizontal and 
vertical at all points w € E. If we(w) = x € M, then the tangent space to 
t. at w, denoted by V,,, is by definition the vertical subspace. Let 
mg: Px F > E be the natural projection, so that wg(p,€) =[p, €] = w. 
Choose a pair (p, €) € te (w). If you fix p and let € vary over the entire F, 
by Theorem 34.1.14, you get a diffeomorphic image of F’, namely i. (x) 
if te(w) = x. More precisely, the diffeomorphism p: F > ee (x) has the 
differential map p, : T(F) > Tas Go) and V,, = px(T(F)). 

The procedure described above for obtaining the vertical subspace gives 
us a hint for defining the horizontal subspace Hy, as follows. Instead of fix- 
ing p, now fix & and let p vary. More precisely, define the map fz: P > E 
by fe(p) = p& with differential fg, : Tp(P) > Tw(E). Define the hori- 
zontal subspace of T(E) to be the image of the horizontal subspace H, 
of T,(P). So, Hy = fex(Hp). For this assignment to be meaningful (well- 
defined), it must be independent of the choice (p, &). 


Proposition 34.2.6 Jf mg(p1,é1) = w = mG(p2, é2), then féyx( Hp) = 
finx( Ap). 


Proof First note that fge = fg o Rg. Next note that if mg(pi,é1) = 
G(p2, 2), then there must exist a g € G such that p2 = pig?! and 
& = g&,. Now use these two facts plus the invariance of the horizontal 
space H, under right translation to prove the statement. 


From the diffeomorphism of fe (U) and U x F, and the fact that U is 
an open submanifold of M, we conclude that 


dim 7, (E) = dim T, (M) + dim T; (F) = dim T,(M) + dim V,,. 


34.3 Curvature Form 


From the diffeomorphism of a~'(U) and U x G, and the split of T,(P) 
into horizontal and vertical subspaces, we conclude that 


dim H, + dim V, = dim T,(P) = dim T,;(M) + dim T,(G) 
=dim7,(M) + dimV, 


so that dim H, = dim T, (M). Furthermore, one can show that fz is an injec- 
tion. Hence, dim H, = dim Hp, and therefore, dim Hy = dim T,(M). This, 
plus the first equation above yields T(E) = Hy ® Vy. 


Definition 34.2.7 A vector field Z in E is horizontal if Z = f<,(X*) for 
some horizontal vector field X* in P. A curve in E is horizontal if its tangent 
vector is horizontal at each point of the curve. Given a curve y in M,a 
(horizontal) lift is a horizontal curve yf in E such that wg(y*) = y. 


Just as there was a unique horizontal lift for every curve y in M starting 
at a given point of the principal fiber bundle P, so is there a unique hori- 
zontal lift of every curve y in M starting at a given point of the associated 
bundle E. In fact, let y(t) = x; be a curve in M. Let wo € E be such that 
E (wo) = xo. Then there is a po € P such that po§ = wo. Let y*(t) = p; 
be the lift of x; starting at po. Let w; = p;&. Then w; is a horizontal curve 
starting at wo. The fact that it is unique follows from the uniqueness of the 
solution of differential equations with give initial conditions. We thus have 


Theorem 34.2.8 Given a curve y(t) =x;,0<t<1in M anda point wo € 
E such that 1g (wo) = Xo, there is a unique lift Vr (t) = w; Starting at wo. In 
fact, if wo = po. é], then w; = pr&, where p, is the lift of x; in P starting 
at Po. 


Recall that a (cross) section o of E is a map o : U — E such that 
Weo(x) =x. Let x,,0<t <1 bea curve in U. Let wo = o (XQ). Then, 
clearly o (x;) is a curve in E starting at wo. Let w; be the horizontal lift of 
x; Starting at wo. In general, of course, w; #0 (x;). 


Definition 34.2.9 We say the section o of E is parallel if 0 (x;), 0 < 
t < 1 is the horizontal lift of x,. 


34.3 Curvature Form 


A connection is a 1-form on P which allows a parallel displacement of 
sections of its associated bundles. Infinitesimal displacements carry the no- 
tion of differentiation which is important in differential geometry. As a 
1-form, the connection accepts another kind of differentiation, namely ex- 
terior derivative. But this differentiation ought to be generalized so that it is 
compatible with the action of the structure group. 


horizontal lift in the 
associated bundle 
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34 Fiber Bundles and Connections 


Definition 34.3.1 Let P(M,G) be a principal fiber bundle and p : G > 
GL(V) a representation of the structure group on a vector space V. A pseu- 
dotensorial form of degree r on P of type (p, V) is a V-valued r-form @ 
on P such that 


Rio = p(g"')-@. 


¢ is called a tensorial form if @(X,, X2,..., X;) = 0 when any of the X; € 
T (P) is vertical. In this case we say that @ is horizontal. 


Example 34.3.2 (a) Let o be the adjoint representation Ad : g > GL(g) 
given in Definition 29.1.26. Then (b) of Definition 34.2.1 shows that the 
connection form m is a pseudotensorial form of degree 1 of type (Ad, g). 

(b) Let po be the trivial representation, sending all elements of G to the 
identity of GL(V). Then a tensorial form of degree r of type (0, V) is sim- 
ply an r-form on P which can be written as ¢ = 2*@y where Py is a 
V-valued r-form on the base manifold M. In particular, if V = R, then @ is 
the pull-back by z of an ordinary r-form on M. 


Remark 34.3.1 Let E(M, V, G, P) be the bundle associated with the prin- 
cipal fiber bundle P with standard fiber V on which G acts through a rep- 
resentation p. A tensorial form @ of degree r of type (p, V) can be consid- 
ered as an assignment to each x a multilinear skewsymmetric mapping d F 
of T;(M) x T;,(M) x --- x T;,(M) (r times) into the vector space te) 
which is the fiber of E over x. Here is how: 


$,(X1, X2,...,X,) = p(o(X7, X3,...,X*)), Xie T%(M). (34.9) 


On the right-hand side, X¥ is any vector field at p such that 7,(X7) = 
X;, and p is any point of P with m(p) = x. Since @ is V-valued, 
o(X}, X5,..., X*) is in V, the standard fiber of E, on which p acts ac- 
cording to Theorem 34.1.14 to give a vector in te (x). As Problem 34.12 
shows, the left-hand side of Eq. (34.9) is independent of the choice of p 
and X; on the right-hand side. Conversely, given a skewsymmetric multi- 
linear mapping , of T,(M) x T,(M) x --- x T,(M) to ge te) for each 
xeEM, pu !o oy o 7, is a tensorial r-form of type (p, V), with p chosen 
such that 2(p) = x. In particular, a cross section f : M — E can be iden- 
tified with f = p~!o 7 o 7, which is a V-valued function on P satisfying 
f (pg) = p(s") f (p). - 

In the special case where p is the identity representation and V = R, ¢ is 
just an ordinary r-form, i.e., rs) € A’(M). 


Let P(M,G) be a principal fiber bundle with a connection, giving rise 
to the split 7,(P) = Hp @ V, into horizontal and vertical subspaces at each 
point p € P. Define h: T,(P) — H, to be the projection onto the horizontal 
subspace. For a pseudotensorial r-form @ on P, define @h by 


(Ph) (X1, Xo,..., X-) =P(AX1, hX2,...,hX,), Xi €Tp(P). (34.10) 
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Definition 34.3.3 Let P(M, G) be a principal fiber bundle with a connec- 

tion 1-form @ that induces the split T)(P) = Hp ® Vp. Leth: Tp(P) > Hp 

be the projection onto the horizontal subspace. The exterior covariant 

derivative (associated with w) of a (pseudo)tensorial r-form @ is defined exterior covariant 
as D®¢@ = (d@)h and D® is called the exterior covariant differentiation. differentiation 


The proof of the following proposition is straightforward: 


Proposition 34.3.4 Let @ be a pseudotensorial r-form on P of type (p, V). 
Then 


(a) the form oh is a tensorial r-form of type (p, V); 
(b) d@ is a pseudotensorial (r + 1)-form of type (p, V); 
(c) D%@ isa tensorial (r + 1)-form of type (p, V). 


Definition 34.3.5 The tensorial 2-form Q° = D®w of type (Ad, g) is called 
curvature form 
the curvature form of w. 


The proof of the following structure equation can be found in [Koba 63, 
pp. 77-78] 


Theorem 34.3.6 The curvature form Q® of the connection form wo 
satisfies the following equation: 


Q° (X, Y) = dw(X, Y) + 5[o®.a%)], MVere). 


The equation of this theorem is abbreviated as : 
structure equation 


1 
Q° =dw+ glo, @]. (34.11) 
The commutator in (34.11) is a Lie algebra commutator (in particular, it is 
not zero). In fact, Eq. (34.13) below captures the meaning of (34.11). 
If X and Y are both horizontal vector fields on P, then Theorem 28.5.11 
yields 


2° (X, Y) = —([X, Y]). (34.12) 


Note that the right-hand side is not zero, because the Lie bracket of two 
horizontal vector fields is not necessarily horizontal. 

It is convenient to have an expression for the structure equation in terms 
of real-valued forms. So, let {E;}/"_, be a basis for the Lie algebra g with 
structure constants ci y> 80 that 


(Ej, Exl=ci,Ei, j,k=1,2,...,m. 
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As g-valued forms, @ and Q® can be expressed as w = w'E; and 2° = 
9'E;, where w! and @? are ordinary real-valued forms. It is straightforward 
to show that the structure equation can be expressed as 


i ee ko 
2Q' =da + Fey? AO", i=1,2,...,m. (34.13) 
Taking the exterior derivative of both sides of this equation, one can show 
that 
dQ! = ci,2! Aw, or dQ? =[2°, a]. (34.14) 


The proof of the following theorem, which follows easily from the last two 
equations, is left as an exercise for the reader (see Problem 34.16): 


Theorem 34.3.7 (Bianchi’s identity) D°Q® = 0. 


In Sect. 34.2.1, we expressed a connection I" in terms of its 1-form de- 
fined on the base manifold M. It is instructive to obtain similar expres- 
sions for the curvature form as well. In fact, since the local connection 
form was simply the pull-back by the local sections, and since the exterior 
derivative and exterior product both commute with the pullback, we define 
Q° = 0 Q” and easily prove 


Theorem 34.3.8  Q? = do, + 4[a@u, eu. 


We found how different pieces of the connection 1-form, defined on dif- 
ferent subsets of M, were related to each other (see (34.6)). We can do the 
same with the curvature 2-form. Using Eq. (34.5) and the definition of pull- 
back, we find 


2° (X, Y) = 0 "N° (X, Y) = Q® (oy4-X, 0x) 
-_ 2° (AT) + Rgiy (yx (CuxX); Ax) + Rony (x)*(TuxY)) 
= 2° (Re, (x) (FuxX, Rey (x) (Oyx¥)), 


because Q® is a tensorial form, and hence, gives zero for its vertical ar- 
guments. By Proposition 34.3.4, Q® is of the same type as o, 1.e., of type 
(Ad, g). Therefore, we have 


26 (K, Y) = Ad, 12 (GuxX, ux ¥) = Ad, 10" Q”(X, Y), 
or 


QY =Ad,-12% (34.15) 


Using Eq. (34.14), a similar derivation leads to the local version of the 
Bianchi’s identity: 


AQ? = [2° w, |. (34.16) 


34.3 Curvature Form 


All the foregoing discussion simplifies considerably if the structure group 
is abelian. In this case all the structure constants chy vanish and Equations 
(34.13) and (34.14) become 2! = da! and d@' =0, respectively. Further- 
more, it can be shown (Problem 34.17) that Ad, = idg, the identity of the 
Lie algebra of G. We summarize all this in 


Proposition 34.3.9 Jf the structure group is abelian, then 


Oy =O yy +@ux, Q° = do, dQ® =0, 
Q° = do, 2? = Q° 


where 0 yy is as in Eq. (34.7) and represents the first term on the right-hand 
side of Eq. (34.6). 


34.3.1 Flat Connections 


Let P=M x G be a trivial principal fiber bundle. Let 72: Mx G—>G 
be the projection onto the second factor. The differential of this map, 72. : 
T(M) x T(G) — T(G), has the property that 72,.(X) = 0 if X € T(M) (see 
Box 28.3.3). With @ the canonical left-invariant 1-form on G, let a = 736. 
Then @ is a 1-form on P, and one can show that it satisfies the two condi- 
tions of Definition 34.2.1. Hence, w is a connection on P. The horizontal 
space of this connection is clearly T(M). The connection associated with 
this w is called the canonical flat connection of P. The Maurer-Cartan 
equation (29.17) yields 


dw = d(730) = 1}(d0) = n3(—310.01) 
1 ‘ 1 
= —5 [73 @). 23 (0)] =—5 lo, 0. 


3 


Comparison with Eq. (34.11) implies that the curvature of the canonical flat 
connection is zero. 


Definition 34.3.10 A connection I" in any principal fiber bundle P(M, G) 
is called flat if every x € M has a neighborhood U such that the connection 
in z~!(U) is isomorphic to the canonical flat connection in U x G. 


The vanishing of the curvature is a necessary condition for a connection 
to be flat. It turns out that it is also sufficient (see [Koba 63, p. 92] for a 
proof): 


Theorem 34.3.11 A connection in a principal fiber bundle P(M, G) 
is flat if and only if the curvature form vanishes identically. 


The existence of a flat connection in a principal fiber bundle determines 
the nature of the bundle: 


The abelian case 


canonical flat 
connection 


flat connection 
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Corollary 34.3.12 If P(M,G) has a connection whose curvature form 
vanishes identically, then P(M,G) is (isomorphic to) the trivial bundle 
M x G and the connection is the canonical flat connection. 


34.3.2 Matrix Structure Group 


The structure groups encountered in physics are almost exclusively matrix 
groups, or subgroups of GL(n, R). For these groups, the equations derived 
above take a simpler form [see, for example, Eq. (34.8)]. Furthermore, it is 
a good idea to have these special formulas, so we can use them when need 
arises. 


Proposition 34.3.13 Let N be a manifold and G a matrix Lie group with 
Lie algebra g. Foro € AK(N, g) andy eé AS(N, g), we have 


[o.vl=¢rAv-(C-) Pare 


where @ and & are regarded as matrices of R-valued forms and @ \ w is 
matrix multiplication with elements multiplied via wedge product. 


Proof For matrix algebras, the commutator is just the difference in products 
of matrices. Therefore, 


_ woe Xap j) 


eal "ee Ex (Ka(i)s +++ XW Ra (ke+1)s +++ Xxk+)) 


=@AW)(X1,....Xx+j) 


~ yl FY Meat wees Moke jy )P (Xa), +», Xrcky) 


lic 


a ae 1) op Kacy ++ Xa yb Kati) +++) Xaci+h))- 


Noting that the last sum is (—1)¥ (Ww A @)(X1,..., X47), we obtain the 
result we are after. 


Corollary 34.3.14 If G is a matrix group, then 


Q° =dwo+aorw and QR =doy +o, ox. 


Proof The proof follows immediately from Eq. (34.11) and Proposi- 
tion 34.3.13 withk = j= 1. 


The following theorem can also be easily proved: 


34.4 Problems 


Theorem 34.3.15 Let T,, and T, be two local trivializations with transition 
function gyy:U OV => G, where G is a matrix group. Then 


(a) Q?= Buy 22 Buy; 
(b) dQ? =Q2° A wy — @, AQ2. 


34.4 Problems 


34.1 Show that 


(a) a fiber bundle homomorphism preserves the fibers, i.e., two points be- 
longing to the same fiber of P’ get mapped to the same fiber of P; 

(b) if (f, fe) of Definition 34.1.3 is an isomorphism, then the induced 
map fu: M’ > M isa bijection. 


34.2 Finish the proof of Proposition 34.1.5. 
34.3 Complete the proof of Proposition 34.1.9. 


34.4 Using the linear independence of the vectors in a basis, show that the 
action of GL(n, R) on L(M) is free. 


34.5 Show that ¢, :2~'!(U) > U x F defined by ¢,([p.é]) = (x(p), 
Sy(p)&) for the associated fiber bundle is well defined (i.e., if [p’, €’] = 


[p, &] then ¢,([p’, €’]) = bu([p, €])) and bijective. 

34.6 Show that the map p: E > F,, given by p(&) = pé = [p, €] satisfies 
(pg) = p(gs) forpeP, geG, Fe E. 

34.7 Show that the map S, :U x G > 27!(U), given by S,(x,g) = 

o,(x)g for a local cross section o, is a bijection, and that 7, = S7 lisa 

local trivialization satisfying condition (3) of Definition 34.1.1. 


34.8 Provide the details of the proof of Proposition 34.2.6. 


34.9 Show that the map fz : P — E defined by fs(p) = pé is injective. 
Hint: Show that [p1, €] = [p2. €] implies p; = po. 


34.10 Show that Eq. (34.7) follows from Eq. (34.6). 
34.11 Show that the canonical flat connection on P = M x G given by 
a= 150, where @ is the canonical 1-form on G satisfies both conditions of 


Definition 34.2.1. 


34.12 Show that Eq. (34.9) is independent of the choice of p and X¥ on the 
right-hand side. 
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34.13 Prove Proposition 34.3.4. 
34.14 Derive Eq. (34.12). 
34.15 Derive Eq. (34.13). 


34.16 Taking the exterior derivative of both sides of Eq. (34.13), show that 


l k 


; i ; 1. é 
tol Jd | eee | m 
dX2 = C5482 AW 5 CikCim® A@O AQ@. 


Using Lie’s third theorem (29.13), show that the second term on the right- 
hand side vanishes. Now prove Bianchi’s identity of Theorem 34.3.7. Hint: 
@(X) = 0 if X is horizontal. 

34.17 Let idy : M — M be the identity map on M. Prove that idy,, is the 
identity map on T(M). Let Iz = R,-1 o Lg be the inner automorphism of a 
Lie group G. Show that if G is abelian, then J, = idg for all g € G. Now 
show that Ad, = idg. 

34.18 Prove Theorem 34.3.8. 

34.19 Provide the details of Corollary 34.3.14. 


34.20 Provide the details of Theorem 34.3.15. 


Gauge Theories 3 5 


The machinery developed in Chap. 34 has found a natural setting in gauge 
theories, which have been successfully used to describe the electromagnetic, 
weak nuclear, and strong nuclear interactions. In these physical applications, 
one considers a principal fiber bundle P(M, G), where M = R* with a met- 
ric n which is diagonal with yn}; = —1 = —n;; for i = 2, 3, 4, and the struc- 
ture group is a matrix group, typically SU (n). 


35.1 Gauge Potentials and Fields 


The application of the theory of the principal fiber bundle to physics concen- 
trates on the local quantities. In a typical situation, one picks a local section 
Oy, Which in the new setting is called a choice of gauge, and works with 
@, = 0, @, now called a gauge potential. The curvature form, being the 
derivative of the gauge potential is now called gauge field. Theorem 34.3.11 
and Corollary 34.3.12 imply that all principal fiber bundles used in physics 
are nontrivial, because otherwise the gauge field would be identically zero. 


Example 35.1.1 Consider the simplest (unitary) matrix group U(1). Any 
group member can be written as e’” with w € R. The Lie algebra consists of 
just the exponents: u = {ia|a € R}, implying that the basis is i = /—1. If 
o,:U + P,U CR isa local section, then w, is a u-valued 1-form on U. 
Since the Lie algebra is one-dimensional, the discussion in Sect. 34.2.1 im- 
plies that w, =iA,. Here A, is a real-valued 1-form on U. Pick a coordi- 
nate basis in U and write A, = A,,dx*. Note that A, has four components 
{Auk}{_1- That is why it is called a vector potential. 

Another section o, yields @, = iA,. We want to see how @, is related 
to w,. Since the structure group is an abelian matrix group, Eq. (34.8) and 
Proposition 34.3.9 give 


Oy = 5 deaw + Oy. 
With gy, (x) = e!% ©), we obtain 
Agu = d (eh) = el O jd on ydx*. 
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35 Gauge Theories 
Hence, the preceding equation yields 
TA yd x® = ett) plu) a ydx* + iAydx*, 
or 


Ave = Auk + O¢¢uy, k=1,...,4. (35.1) 


The reader may recognize this as the gauge transformation of the vector 
potential of electrodynamics. 

The curvature can be obtained using Proposition 34.3.9, which also 
shows that the curvature is independent of the local section (although the 
connection itself is not). Thus, ignoring the subscript u, with a =iA = 
iA,dx*, we obtain 

Q = dw = d(iAxdx*) = id; Agdx! Adx*, 


and writing Q =iF, where F is a real-valued 2-form, yields 
F =0;Axdx/ A dx* = 5 (jAk — A j)dx! Adx*. 
The (antisymmetric) components of F are therefore, 
Fix = Oj Ak — OKAj, 
which is the familiar electromagnetic field strength, with 
Ej = Fij, By = 02A3 — 03A, By = 03A, — 01 A3, 
B3 = 0, A2 — 02A. 


The Bianchi’s identity, dQ = 0 or dF = 0, in terms of components, be- 
comes 


O=dF = 0) Fjxdx' Adx! Adx*, 
which can be shown to lead to the two homogeneous Maxwell’s equations: 


0B 
V-B=0, VxE=-—. 
ot 


We summarize the discussion of the preceding example: 


Box 35.1.2 Electromagnetic interaction is a principal fiber bundle 
P(M, G) with M a Minkowski space and G = U (1). 


It appears as if the P in the principal fiber bundle had no role in our dis- 
cussion above. That is not so! Remember that the structure of P as a bundle 
is determined entirely by the transition functions (see the discussion after 
Proposition 34.1.5). And we used the transition functions in determining 
how the connection |-forms were glued together on the base manifold. 


35.1 Gauge Potentials and Fields 
35.1.1 Particle Fields 


Principal fiber bundles give us gauge potentials and gauge fields. A gauge 
field is responsible for interaction among matter particles, and an interaction 
is typically described by a Lagrangian written in terms of fields and their 
derivatives. Therefore, we have to know how to describe matter (or particle) 
fields and how to differentiate them. 

The realistic treatment of particle fields requires the introduction of the 
so-called Clifford and spinor bundles. This is because all fundamental par- 
ticles, i.e., quarks and leptons, are described by (complex) spinors, not vec- 
tors. Although we have discussed the algebra of spinors, and it is indeed the 
starting point of the discussion of Clifford and spinor bundles, the construc- 
tion of a differential structure on these bundles and its pull-back to the base 
manifold M = (R*, n) is beyond the scope of this book. 

Therefore, we restrict our discussion to the (admittedly unphysical) case, 
where particles are described by vectors rather than spinors. This brief dis- 
cussion is not entirely useless, because there are certain similarities between 
vector bundles and spinor bundles, and the discussion can pave the way to 
the understanding of the realistic case of Clifford and spinor bundles. 


Definition 35.1.3 Let P(M,G) be a principal fiber bundle and V a vec- 
tor space on which G acts (on the left) through a representation. Let 
E(M,\V,G, P) be the associated bundle with standard fiber V. A section 
of E,i.e.,amap yw: M = E is called a particle field. 


Let A*‘(P,V) be the set of tensorial forms of degree k of type (p, V). 
Let g — gl(V) be the Lie algebra homomorphism induced by the represen- 
tation G > GL(V). If ¢ € A*(P, V) and p € A/(P, g), then we can define 
a wedge product A: A/(P,g) x A‘(P,V) > A/tk(P, V) as follows: 


(WA) (X1,.--, Xj+x) 
1 
= Fa Yo er M(Xn(1)s--- Xn) Rui, ---Xrj+m). 5.2) 
as 
Note that (X71), Sates Xxj)) € g and O(Xaj+, dso Xx(j+k)) € V, so the 
dot (symbolizing the action of the Lie algebra) between the two makes 
sense. 


Theorem 35.1.4 Let D® be the exterior covariant differentiation of Defini- 
tion 34.3.3. If @ € A*(P, V), then 


D°¢ =do+ah 
D® D°¢ =Q/A¢. 
Proof The proof of the first equation involves establishing the equality for 
cases in which the vector fields in the argument of D’@ (a) are all horizontal, 


(b) all but one are horizontal, and (c) have two or more vertical vectors. The 
details can be found in [Ble 81, pp. 44-45]. 


Clifford and spinor 
bundles 
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35 Gauge Theories 


The second equation can be derived by taking the exterior derivative of 
the first and using the definition of D®. We have 


D°D°d = d(do + @AG)hA = (dw)hAgh — @h)A(do)h = D° wh 


because wh = 0 and dh = ©. 


Corollary 35.1.5 Let @ € A*(P, g) and assume that the action on g is via 
the adjoint representation. Then D°o = do + [w, @]. 


Note that there is no conflict with Eq. (34.11) because @ is a pseudoten- 
sorial form not a tensorial form. 


35.1.2 Gauge Transformation 


The gauge transformation of the electromagnetic vector potential (35.1) was 
obtained by starting with local sections glued together by a transition func- 
tion and pulling back the connection 1-form using the local sections. So 
there is some kind of relation between gauge transformations and local sec- 
tions that we want to explore now. We note that all terms of Eq. (35.1) are 
evaluated at the same x € M. Therefore, that gauge transformation does not 
leave the fiber of the bundle. This is the condition that we want to impose 
on any gauge transformation: 


Definition 35.1.6 A gauge transformation of a principal fiber bun- 
dle P(M,G) is an automorphism (f,idg) of the bundle for which 
fu = idy, ie., t(p) = m(f(p)). The set of all gauge transforma- 
tions of P(M, G) is denoted by Gau(P). 


Thus, a gauge transformation does not leave a fiber. We also know that 
the right action of the structure group is also confined to a fiber. So the 
natural question to ask is “Is the right action a gauge transformation?” Re- 
call (see Definition 34.1.3) that an automorphism satisfies f(pg) = f(p)g. 
However, Rn(pg) = pgh  Ri(p)g = phg. 

As mentioned above, there is a close relationship between cross sections 
and gauge transformations. Let S(E, M, F) be the set of all sections of the 
associated bundle E with base manifold M and standard fiber F.! Consider 
S(E, M, G) in which G acts on itself via the adjoint transformation: g -h = 


ghgo!. 


Theorem 35.1.7 There is a natural bijection 8(E, M, G) = Gau(P). 


'Note that particle fields are members of S(E, M, V). 


35.1 Gauge Potentials and Fields 


Proof For o € 8(E, M,G) define f : P > P by f(p) = pm. 00 on(p), 
where zz projects onto the second factor of E = P xg G. The composite 
i200 07 assigns to each p its partner in [p, /]. If the partner of p is h, 
then the partner of p’ = pg is h’ = g~'hg. 

Suppose that f(p) = ph, so that 12 00 om(p) =h, then 


f (pg) = pgm 00 om (pg) = pgh’ = peg ‘hg = phg = f (p)g. 


Hence, f € Gau(P). Conversely, suppose that f € Gau(P), ie., that 
I (pg) = f(p)g. Define o by f(p) = pm2 00 0 z(p), and note that, on 
the one hand, f(pg) = pgm2 00 0 m(pg), and on the other hand f (pg) = 
pm200 01(p)g. Therefore 


gm,00 on (pg) =M2000T(p)g or M2000T(pg)= g !n200 on(p)g, 


which is a necessary condition for o to be well defined: o (x) could be [p, h 
or [p’, h'] where p’ = pg and h' = g~'hg for some g € G. The preceding 
equation restates this relation. 


Remark 35.1.1 In the proof of Theorem 35.1.7, we introduced the com- 
posite map 72 00 oz, which mapped the first factor of o(x) = [p, A] to 
its second factor. This is the same construction as the one discussed af- 
ter Example 34.3.2, where 2 is denoted by p~!. Sometimes it is more 
convenient to work with this map rather than the section o. So, if F is 
a manifold on which G acts on the left, we define a map 712: P > F 
with the property 212(pg) = g~! - 212(p). This property brings the set 
of sections in a bijective correspondence with the set [T)2(P, F) of such 
maps: S8(E, M, F) = IT\2(P, F). When F = G, we get the triple isomor- 
phism 


S(E, M, G) = IM2(P, G) = Gau(P). 
In particular, any f € Gau(P) can be written as f(p) = pi2(p) for some 


m2 € IT\2(P, G). 


A gauge transformation transforms the connection 1-form as well. How 
does the connection |-form change under a gauge transformation? 


Theorem 35.1.8 If f € Gau(P) and ® is a connection |-form, then 
f*@ is also a connection 1-form. 


Proof We show that f*q@ satisfies condition (a) of Definition 34.2.1 and 
leave the proof of condition (b) for the reader. Let A € g and let A* be its 
fundamental vector field in P. Then, 


312 and IT,2(P, F) 
defined 


1103 


1104 


gauge transformation of 
vector fields 


35 Gauge Theories 


d 
(f*@) Aj, =0(fxA5) =o( © 4(pva) 


2 


) = o( fAj(p) =A, 


= d 
=0( 5 Fora 


t=0 


where to go to the second line, we used f (pg) = f(p)g. 


Gauge transformations not only map connections to connections, but they 
also map tensorial forms to tensorial forms. 


Proposition 35.1.9 If @ « A‘(P,V) and f € Gau(P), then f*o © 
A‘(P,V). 


Proof Using the fact that f o Rg = Rg o f, one can show easily that 
Roe *h) = p(g—!) - (f*). That @ is horizontal follows from the equal- 
ity f,A* = A* derived in the proof of Theorem 35.1.8. 


Theorem 35.1.8 and Proposition 35.1.9 tell us that gauge transformations 
send connection 1-forms to connection 1-forms and tensorial forms to ten- 
sorial forms. With the help of Remark 35.1.1, we can find explicit formulas 
for the gauge-transformed forms. Both formulas can be derived once we 
know how gauge transformations transform vector fields on P. 


Lemma 35.1.10 Let f € Gau(P) and m2 € II\2(P,G) be related by 
f(p) = pm12(p). For X € T, P, we have 


F(X) = [LF (py 12" OO] p(p) + Rmaipye(X, (35.3) 
where [B], denotes the fundamental vector field of BE gatqé€ P. 


Proof Let yx (t) be the integral curve of X passing through p (i.e., yx (0) = 
p). Then 


d 
= qlyx Om12(rxO)] 
t=0 


d 
fr(X) = at (yx) 


t=0 


d 
= 7 lem2(yx)] 


d 
Ls + ht [yx (t)m12(p)] 


=Rr15(p) YX) t=0 


The rest of the proof follows from an identical argument leading to 
Eq. (34.5). We leave the details as an exercise for the reader. 


Using this lemma, we can instantly prove the following theorem: 


Theorem 35.1.11 Jf f € Gau(P) and m2 € II\2(P,G) are related by 
Sf (Pp) = pm12(p), @ is a connection, and @ € AK (P, V), then 


f*@)p = Le atpye12e + Ad,,,5(p)-1@p and (f*9), = es -@. 


35.2. Gauge-Invariant Lagrangians 


Proof For the first equation, apply w to both sides of the equation of 
Lemma 35.1.10. For the second equation, recall that (f*@)(X1,..., Xx) = 
$(fxX1,---, fxXx). Now use the Lemma for each term in the argument, 
note that @ annihilates the first term on the right-hand side of Eq. (35.3), and 
use the defining property of a tensorial form as given in Definition 34.3.1. 


35.2 Gauge-Invariant Lagrangians 


We have already encountered Lagrangians defined on R”, and have seen 
consequences of their symmetry, i.e., conservation laws. In this section, we 
want to formulate them on manifolds and impose gauge symmetry on them. 


Definition 35.2.1 A Lagrangian is a map L: P x V x T(P) > R satisfy- 
ing 


L(pg,g'+v,g7! Op 0 Ry-1g) = L(p, v, 9p), 
forall pe P,ve€V,0,:T,(P) > V, and g eG. 


A Lagrangian whose codomain by definition is R is usually the integral 
of a Lagrangian density, which is a function on M. In fact, the condition im- 
posed on the Lagrangian in its definition was to ensure that the Lagrangian 
density is well-defined. The following proposition connects the two. 


Proposition 35.2.2 Given a Lagrangian L, there is amap £o : IT\2(P, V)> 
C°(M) defined by Lo()(x) = L(p, W(p), dW p) for x € M, p € P with 
U(p)=x,and pf € IT\2(P, V). 


Proof Write w(pg) = g~! - W(p) (the property that w has to obey 
by Remark 35.1.1) as fo Ry = g-!.w with the differential dW pg © 
Rox = W pex 0 Rex = g- | - dp. Now show that L(pg, ¥(pg), dW pe) = 
L(p, W(p), dp), ie., all points of the fiber m~'(x) give the same La- 
grangian. 


A gauge transformation f takes p € P to f(p) € P without leaving the 
fiber in which p is located. This means that there must exist a g € G such 
that f(p) = pg. Now, we are interested in Lagrangians which are invariant 
under such a transformation. That is, we want our Lagrangians to obey 


L(f(p), v, 8 f(p)) = L(p, 0, 8 p) or 
L(pg,v,9 pg) = L(p, v, 8 p) for g eG. 


Using the defining property of L, we can rewrite the second equation as 
L(pgg”', 8°, 8: Ong 0 Re) = L(p, v,0p). 
Noting that 6 pg =@p)0 Re-ix, WE obtain 


L(p,g-v,g-0p)=L(p,v, 6p). (35.4) 
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We say a Lagrangian is G-invariant if it satisfies this equation. We do not 
use “gauge-invariant” because Eq. (35.4) does not use f directly. However, 
noting that I7;2(P,V) = A°(P, V) and the fact that f acts on A‘(P,V) 
via pull-back, we can investigate the gauge-invariance of the Lagrangian 
density, i.e., we want to see if Lo of Proposition 35.2.2 is gauge invariant, 


ie., if Lo(w) = Lo(f*w) or 


2 


L(p,W(p), dv p) = L(p, (f*W) (p), (f*a),) 


L(p, (f*¥)(p).d(f*¥),)- 


If f € Gau(P) and m2 € IT\2(P,G) are related by f(p) = pmi2(p), 
then by Theorem 35.1.11, we have f*y = iy -w because IT}2(P, V) = 
A°(P, V). We now compute d(xj5, -w). Let yx (t) be the flow of X. Then 


d 
d(x;5' -y)(X)= ae "(yx (t)) + v (yx (t)) 


t=0 


eer p (yx) - vp) 7 


= 173! (p)- Wxp(X) es W(p). 


om ‘(p)-w(yx ©)| 


Therefore, 


Lol f*v) (x) = L(p, (f*)(p), d(f*¥),) 
=a (p)-W(p), 19 3 (p): dW pp +194 w(p)). 


Were it not for the extra term in the third argument of L, we would have 
gauge invariance. However, the presence of ae - w(p) makes the La- 
grangian density, as defined in Proposition 35.2.2, not gauge-invariant. 

The very notion of gauge transformation involves a connection 1-form. 
Yet nowhere in the definition of the Lagrangian or Lagrangian density did 
we make use of a connection. So, it should come as no surprise to see the 
failure of invariance of £o under gauge transformations. Since it is the dif- 
ferential term dw of L that causes the violation of the invariance, and since 
we do have a differentiation which naturally incorporates the connection 
1-form, it is natural to replace dw with D°y. So, now define a new density, 


Lip, w)(x) = L(p, W(p), D°vp), (35.5) 


forx € M, pé P with z(p) =x, and f € I7)2(P, V). 


Theorem 35.2.3 If L in Eq. (35.5) is G-invariant, then £ is well- 
defined and L(f*w, f*o) =L(W,@), i.e., & is gauge-invariant. 
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Proof From Proposition 34.3.4, we have Re DW ng =—g!. D°y ». This 
can be rewritten as 


D°W pg 0 Rex = B+ Dy or DW yg = 8): D?Wypo Reg. 
Then, 
L(pg. ¥(pg), D°W pe) =L(pg.g| Wp), 8)» D°W po Rei) 
=L(p,W(p), D°vy), 


where in the second equality, we used the defining property of a Lagrangian 
in Definition 35.2.1. So, £ is well-defined. Now from Theorem 35.1.4 and 
Eq. (35.2) with 7 = 1 and k= 0, we have Dow =dw+@-w. Using this 
result, we obtain 


L( fw, fro) (x) = L(p, (f*¥)(p). (fw), + fen: (F*)(P)) 
=L(p, (f*¥)(p). f*(dWp +p - ¥(p))) 
=L(p.m12(p) '-(p), f*(D°Wp)). 


Then the second equation of Theorem 35.1.11 and the fact that L is 
G-invariant yield 


L(f*y, f*@) (x) = L(p, m12(p) | - W(p), m12(p)| + D°¥p) 
=L(p.¥(p), D°vp) = L(y, @)(x), 


showing that £ is indeed gauge-invariant. 


35.3 Construction of Gauge-Invariant Lagrangians 


We have defined Lagrangians in terms of w € A°(P,V) and D°w € 

A'(P,V). Since G acts on V via a representation, G > GL(V), the most 

natural invariants would be in the form of inner products in V and the re- 

striction of GL(V) to its unitary subgroup U(V), i.e., we assume that the 

representation is G — U(V), in which case we say that the inner product is 

G-orthogonal. We are thus led to defining G-orthogonal inner products on G-orthogonal inner 

we AK(P,YV). product 
Let h be a metric in M. Let 2* be the pull-back of 7: P > M. Then 

z*h is a bilinear form on P. However, it is not an inner product because 

m*h(X, Y) = h(2,.X, 1, Y) vanishes for all Y if X is a nonzero vertical vec- 

tor field. Nevertheless, 2*h does become an inner product if the vectors are 

confined to the horizontal subspace H,. Since the forms in AK(P , V) are 

horizontal, they could be thought of as forms in H,. In particular, Ak (P,R) 

could be identified with the regular k-forms A*(M) as explained in Re- 

mark 34.3.1. This means that the inner product h defined on A* (M) as 

in Eq. (26.52) could be lifted up to a similar inner product on A*(P,R), 
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which we denote by h. The inner product on A‘(P, V) is then hh as de- 
fined in Eq. 26.53. Specifically, if w,@ € A*(P, V), {ej}, is a basis of V, 


v = pear wie, p = viel p/e;, and hij = hie;, e;), then 


ihiw.d) = Y~ hijh(v'.o), (35.6) 


igel 
where y', ¢/ € A*(P,R). 
All the discussion above can be summarized as 


Theorem 35.3.1 The functions 


hh: A‘(M,V) x A*(M,V) > C®(M) 


h: A*(P,V) x A‘(P,V) > C®(M), 


=~! 


where the second one is given by Eq. (35.6) and the first by a similar expres- 
sion, are well-defined. Furthermore, if v,@ € A‘ (P, V) ando:U => P is 
a local section, then 


hh(o"W.0*) = hh. 9). 


The last statement is a result of the fact that h © 2*h and yO, = id. 
With an inner product placed on A*(P, V), we can define the Hodge star 
operator and a codifferential. 


Definition 35.3.2 The covariant codifferential 5° : A*‘(P,V) > 
A‘!(P, V) of ¢ € A*(P, V) is defined by 


8° = (1) 1-1)" DED? (59), 


where v is the index of h, n = dim M, and * = 2*(«), with * the star oper- 
ator on M. 


Theorem 28.6.6 has an exact analog: 
[ hh(D°w,o)u = i: hh(w,6°o)m, (35.7) 


where w € A‘(P, V) and g € A‘*!(P, V). 

Now that we have inner products for various elements of the Lagrangian, 
we can write a G-invariant Lagrangian as an inner product. The simplest 
L(p, v,9) which is G-invariant is 


L(p, v,0) =hh@n, On) — mh(v, v) 


where the subscript / means the horizontal component, and the constant m 
is introduced to match the dimensions of the two terms. This leads to the 
gauge-invariant Lagrangian density 


Lb, w) = hh(D°y, D°y) — mh, W). (35.8) 


35.3. Construction of Gauge-Invariant Lagrangians 


We have to add to this a Lagrangian density associated with the connec- 
tion itself. So, we might try something similar to the preceding expression. 
However, while the first term is allowed, the second term, the mass term, 
cannot be present because @ is not a tensorial form. Furthermore, it does not 
have a horizontal component in the sense of (a) of Proposition 34.3.4. That 
is why 


Box 35.3.3 A gauge-invariant Lagrangian cannot contain a mass 
term for the gauge potential. 


If hg is a metric on G, then the gauge Lagrangian density can be ex- 
pressed as 


Cotii= — Fhe (D%e, D°o) = — Shine (2°, 2”), (35.9) 


where the minus sign and the factor of , make the equations of motion 
consistent with observation. The total Lagrangian is just the sum of (35.8) 
and (35.9): 


ier’ si 1_=— 
Lior, @) = hh(D°y, D°y) — mh, wp) — shha (2, a”). 


(35.10) 

From the Lagrangian density we can obtain the Lagrange’s equation by 

variational method. We use Eq. (33.4) to find the variational derivative. The 
particle field and connection are stationary if at t = 0 


d d 
=| Cot mo ++ = | Lg@+té)u=0, (35.11) 
dt Ju dt Ju 

where n € A°(P, V) andé € A!(P, g). The first term gives 


. Lo + moms f LW, w + 1m, (35.12) 
dt Ju dt Jy 


whose first term can be written as 


= L(p, Ww + tn, D°(w + tn) 
t Ju 


d 
-</ L(p,w + tn, D°Wh)) ue 
M 


dt 
d 

73 al L(p,, D° (Ww) +1D°(n)) mn 
t JM 


- a ih(dsL(p. ¥, D° Qh), D°(n))b- 
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Denote aL(p, 4, D°(h)) by IL/dW and a3L(p.v, D°(Y)) by aL/ 
0(D°y) and use Eq. (35.7) in the second term to obtain 


d 
L(p.v +t, D° + tn) 


M 
afoL OL 
= h : h{ 6° ——_—_., 
I, (FF nya J i( a(D°W) n)a 


or 


ala + tn, D°(Ww+ tn)) mu = Lil xoan + =a nu 
(35.13) 

Note that it can actually be shown that 0L/dw € A°(P,V) and dL/ 
d(D°w) € Al(P,V). However, the variational calculations above paired 
these forms with forms of the correct degree. Note also that 6°(aL/ 
d(D°w)) € A°(P, V), and it pairs up in h. 

To calculate the second term of Eq. (35.12), we note from Theo- 
rem 35.1.4 that 


D?tS yp) =dp+o- w+ -p=D (Ww) +té-y. 


current (or current Therefore, (at t = 0), we have 
density) defined 


{L.0 +18) = SL (p,W(p), D2) +18 -W) 
= aie W(p), D°Wp),&-W) 


A 


=n 0 as ) 
— (soy o ¥) siinc(J .€), (35.14) 


where the last line defines the current density J° < A!(P, g). 


For the second term of Eq. (35.11), we assume a Lagrangian as given by 
Eq. (35.9) and write Q; = Q°* . Then (at t = 0) 


d d 1 
ane = © (dio +16) + slo +16.0 +161) = dé + [w,§] = D%. 
Hence, (at t = 0) 


d ad 
Weeet t€) = —~—hhg (Qr, Qr) 


—_ ld=<= 
= > dt hhg (2°, Q,) = 5 gp tte (@. 2°) 
1 ~_~ 


= —<hhg(Q®, D°é) — slihe(D°, Q°) 


i) 


= —hhg(Q”, D°E). (35.15) 
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We now substitute (35.13), (35.14), and (35.15) in Eq. (35.11) to obtain 


[ilo co+e ) +f incw.8) 
. DW) ay yh bg G\V,§)M 
-| hho(Q®, D°é)u =0. 
M 


Using Eq. (35.7) on the last term, we remove the covariant differentiation 
from €. Then noting that n and & are arbitrary and independent of each other, 
we obtain 


Theorem 35.3.4 The particle field wy and the connection \-form @ are sta- 
tionary relative to the total Lagrangian iff they satisfy the following two 
Lagrange’s equations: 

OL OL 


6? ——__ + 0, 
a(D°y) ay Lagrange's equations 
(b) 8°Q° = J°(p). 


(a) 


Let us examine the current J” in some more detail. It is common to sup- 
press the superscript w, although the current depends on the connection. The 
fact that the current pairs up with € ¢ A!(P, g) tells us that J € Al(P,g). 
Denoting 0L/d(D°w) by @ and letting {e,} and {e;} be bases for g and V, 
respectively, the last line of Eq. (35.14) can be expressed as 


hh(die;, ep ‘p= hho(J%ey. &ep), 


where summation over repeated indices is understood and J%, $', and & B 
are real-valued forms. From the definition of the composite inner product, 
we get 


h(d'h(ei,ep-W),€°) =h(J“hg (ea, ep), €") = hoaph(J%,&?), 
or 
hPn(p'h(ei,eg -w), €°) =h(J%, €°). 
Since this must hold for any &*, we get 
he ghee, ep-W)=J%, or JX =hPh(d,eg-y) and 
J=h@hg.ep-W) ea, 
with J = J%e,. Substituting for @, we finally obtain formula for current 


a _ ,apy OL . __ ape OL 
J = KE (see v) and =n i ep pon 


density 


For the Lagrangian of Eq. (35.10), this simplifies to 


J% =2nPh(D°y,eg-) and J =2hPh(D°y, eg -W)ew. (35.17) 
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35.4 Local Equations 


In physical applications, potentials and fields are defined on the Minkowski 
manifold M = (R’*, n). Therefore, for our formulas to be useful, we need to 
pull our equations down to the base manifold M. This is achieved by local 
sections. So, let o, : U — P be such a section, and use Theorem 35.3.1 to 
pullback functions and forms from P to M. Then inner products will be 
defined on M rather than P. In fact, using the equation of Theorem 35.3.1, 
we can write the Lagrangian of Eq. (35.10) as 


Lrot(w,@) = hh(oz D°y, a, D°p) —mh (o7v.o; ) 
— sie (ofQ°, of 2”), 


As shown in Chap. 34, the result is to substitute local expressions for all 
quantities. For example, it can be shown easily that of D°yw = Dy, 
where @, is a g-valued 1-form and w,, a V-valued function on M. Removing 
the subscript u, we write the Lagrangian of Eq. (35.10) as 


ars A Wes 
Lio (Yo) = hh(Dey, Dep) — m*hy, w) — shhg(Q”,Q°), 35.18) 
where all quantities are now defined on M. Let {0,,} be a coordinate basis 


of M, {e,} a basis of V, and {e;} a basis of g. Then the Lagrangian can be 
expressed in terms of components: 


Lio(W,@) = hysh((D°w)’, (D°)’) — m7h,s(W", w*) 
1 Ee , ; 
= shaiih(2", 25), (35.19) 
where we have used Eq. (26.53) and the notation h is as in Eq. (26.52). 
Finally, with hy» = h(d,, 0,) and h*” the inverse matrix of h,», and using 
(26.52), we get 
Lro(W,@) = hrsh” (D°w)’ (D°w), 
x 1 : : 
=m hrs Wl ys — Theijh"*h'? 2) Qap. 


It is more common to use g for the metric of M. So, switch h to g and use 
h for the metric of the Lie group G. Then 


Lior (Ww, @) = hrsgt”((D°v) 5 (D°y);) 


m 1 j j 
= mhys(w", wv’) = ghia (Qi Qin): 
(35.20) 


35.4 Local Equations 


where (see Problem 35.5) 


(D°v)) = aul" + (As, 

; . 1, 5 ae oe (35.21) 
Qiy =0,A5 = dy Al, + xi (Al Ay, = Aj Ai.) 

Note that a = A,dx" with A, generally a matrix whose elements are func- 

tions of space and time. It can be written as A, = Al ej in terms of the basis 

vectors (matrices) of the Lie algebra. 


Example 35.4.1 Let us now look at a specific example. Let V = R* with 
h;s = 6-s and G the group of two-dimensional rotations, so that G acts on 
R? by matrix multiplication. G is an abelian group; so all structure constants 
are zero. Since G is a one-parameter group, it is one-dimensional, so is 
its Lie algebra. We have already discussed the Lie algebra of this group in 
Example 29.1.35, where, if we identify the tangent space of IR? as the space 
itself, we see that the generator of the group is 


a= 0 -l 
~\1 0 
which we could also have obtained by differentiating the matrix 


ge ee a) 


sinOd cos 


at the identity where 6 = 0. 
We want to write the Lagrangian (35.20) for this bundle. First we note 
that 


w=eA,dx" 


where A,, is just a function (not a matrix), and it has no i index. The second 
formula in Eq. (35.21) becomes 


Quy = dy, Ay — Ay 


because the group is abelian. For the first equation of (35.21), we represent 
w as a column vector with two components (remember that r and s run 
through | and 2), and write 


rrnansnare(es)oa() dC) 
= (ee _ ie 
OnW2+ Api)” 


Then the first term of the Lagrangian becomes 
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sg (8.0% —Ayh2 dIpwo+ Ay) Gee = fee 


Oyo + Ap 


vr 

D 

vo ony Iw) av pve, nh pun 

st” (DY, v3.) ( - )=nh.o8 yD” - (05,22) 
21v 


Putting all the terms together, we obtain 


1 
Lro(W, 0) = Di, Dip + Dh, Dy’ —m? (WF +3) — 7 Qu" 35.23) 


with D},,, and D3,,, as defined in (35.22). 
It is instructive to find the current of this Lagrangian. Since there is only 


one component, we do not label it. Then, the first equation in (35.17) be- 
comes 


Jy =UDiW, e-p) 


where the angle brackets designate the inner product in R*. With 


ie fy _(9 -1\/("1)\_(-% 
a a o¥=(; 0) =): 
we get 


Jy = 2(-W2D},, # WiD3i,) 
= 2[-W2(d.W1 — Au2) + Wi (ue + Api) | 
= 21 duo — 2W2dpW1 + 2Ap (We + ¥5). 


In physics literature, the two components of w are considered as the 
real and imaginary parts of a complex function: w = yw + iy. Then R? 
is treated as the complex plane C, and rotation becomes multiplication by 
the elements of U(1), with which we started this chapter. The equivalence 
e <> i between the two methods becomes clear when we note that 


0 -1\/0 -1 1 0 0 -1\' 0 -1 
(r o)G o)=-(0 1) Go) =O), 
with transposition being equivalent to complex conjugation. The reader is 
urged to go through the complex treatment as a useful exercise (see Prob- 

lem 35.6). 

At the beginning of the chapter, we identified the principal fiber bundle 
having U(1) as the structure group with the electromagnetic interaction. In 
this example, we have included a matter field, a charged scalar field in the 
model. Therefore, our Lagrangian describes a charged scalar field interact- 
ing via electromagnetic force. 


35.5 Problems 
35.5 Problems 


35.1 Prove condition (b) of Definition 34.2.1 for Theorem 35.1.8. Hint: 
First show that fo Rg = Rgo f. 


35.2 Provide the details of the proof of Lemma 35.1.10. 

35.3 Derive Eq. (35.17) from Eq. (35.16). 

35.4 For o,, a local section of M, show that o,f D°w = D°"y,. 

35.5 Convince yourself why a factor 4 was necessary for the gauge La- 
grangian in going from Eq. (35.19) to Eq. (35.20). Using Eqs. (26.23) and 


(34.13) show that the components of Qi, are as given in (35.21). 


35.6 Let V=C, G = U(1), and write the inner product on C as 


1 7 
(21,22) = 3 e122 + 2221), 


with bar indicating complex conjugation. 


(a) Show that the Lagrangian (35.23) can be written as 


pv A 7 . 7 207 I pv 
Lg uW +i AWW OW — iAH) — my — 72" Quy. 
(b) Show that the current is given by 
Jn =iW Ou +iApy) —ivOpv —iAp). 


(c) Substitute the real and imaginary parts of w and show that (a) and (b) 
reduce to the Lagrangian and current given in Example 35.4.1. 
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In the last two chapters, we introduced the notions of the principal fiber 
bundle and its associated bundle. The former made contact with physics by 
the introduction of a connection—identified as the gauge potential—and its 
curvature—identified as the gauge field. The latter, the associated bundle 
with a vector space as its standard fiber, was the convenient setting for par- 
ticle fields. The concentration of Chap. 35 was on the objects, the particle 
fields yy, that lived in the vector space and not on the vector space itself. The 
importance of the vector space comes from the fact that tangent spaces of 
the base manifold M are vector spaces, and their examination leads to the 
nature of the base manifold. And that is our aim in this chapter. 


36.1 Connections in a Vector Bundle 


Let P(M, G) bea principal fiber bundle and E(M, R”, G, P) its associated 
bundle, where G acts on R” by a representation of G into GL(m, R). In such 
a situation, E is called a vector bundle. The set of sections 5(Z, M, R’”) 
has a natural vector space structure with obvious addition of vectors and 
multiplication by scalars. Furthermore, if 4 € C°(M), we have 


(Ag) (x) =A(x)- 9), go €S8(E,M,R”), xe M. 


If g is a section in E, and if @ is to have any physical application, we 
have to know how to calculate its partial (or directional) derivatives. This 
means being able to define a differentiation process for @ given a vector 
field X on M. For a vector field X, let yy(t) be its integral curve in the 
neighborhood of t = 0. Denote this curve by x;, so that X = xo. Lift this 
curve up to w;, and see how changes along this curve. The derivative of @ 
along w,; is what we are after. 


Definition 36.1.1 Let y be a section of E defined on the curve y = x; in M. 
Let x; be the vector tangent to y at x,. The covariant derivative V;,¢ of 
in the direction of (or with respect to) x; is given by 


1 
Vig = lim —[y!t" (ox — g(x 
a9 = him o[y" (eOr4n) — eG] 
S. Hassani, Mathematical Physics, DOI 10.1007/978-3-319-01195-0_36, 1117 
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P/ 


Fig. 36.1 The covariant derivative V;,p. The grey sheets are Tg (xo) and ie (xp), the 
fibers of E at xo and x,. Find g(x,). Construct the horizontal lift w; of x; starting at 
y(x;,). Go backwards to the fiber at x9. Record wo, and find the difference g(x9) — wo 
and divide by h. As h goes to zero, this ratio gives the covariant derivative of g with 
respect to Xo 


where yith is the parallel displacement of the fiber Ae Oren) along y to 
the fiber 7! (x;). 


Note that V;,@ € a (x;), and thus defines a section of E along y. 


Remark 36.1.1 Definition 36.1.1 has a number of interesting features 
which are worth exploring. First, the very notion of derivative involves a 
subtraction, an operation that does not exist on all mathematical objects 
such as, for example, manifold. Thus, the fact that fibers of this particu- 
lar E have a vector-space structure is important. Second, although all fibers 
are isomorphic as vector spaces, there is no natural isomorphism connect- 
ing them. Parallelism gives an isomorphism, but parallelism depends on the 
notion of a horizontal lift of a curve in the base manifold. Horizontal lift, 
in turn, depends on the notion of a connection. One interpretation of the 
word “connection” is that it actually does connect fibers through an induced 
isomorphism. 

Third, any derivative involves an infinitesimal change. Now that we have 
a curve in the base manifold, it can induce a section-dependent curve g(x;) 
in E. A natural directional derivative of the section would be to move 
along x; and see how g(x;) changes. When ¢ changes to ¢t + h, the sec- 
tion changes from g(x;) to g(x;4,). But we cannot compute the difference 
Y(Xi+h) — (Xr), because Y(xr4h) € Tp (Xr+n) while y(xr) € 7, (x), and 
we don’t know how to subtract two vectors from two different vector spaces. 
That is why we need to transfer g(x;+;) to as (x;) via the parallelism yer 
(see Fig. 36.1). The word “parallelism” comes about because the horizontal 
lift of x; is as parallel to x; as one can get within the confines of a connec- 
tion. 
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Definition 36.1.2 A section ¢ is parallel if y(x;) is the horizontal lift of x,. parallel section 
In particular, yitt (P(Xr+n)) = G(x) for all t and h. 


We thus have 


Proposition 36.1.3 A section ¢ is parallel iff Vi,e =0 for all t. 


Furthermore, if we rewrite the defining equation of the covariant deriva- 
tive as 


vit (paren) = 901) +hVi,9 + O(N), 


where O(h) denotes terms of order h* and higher powers of h, then we 
see that two curves that have the same value and tangent vector at t give the 
same covariant derivative at t. This means that we can define the covariant 
derivative in terms of vectors. 


Definition 36.1.4 Let X € TM and ¢ a section of E defined on a neigh- 

borhood of x € M. Let x; be a curve such that X = x9. Then the covariant covariant derivative in 
derivative of ¢ in the direction of X is Vyg = Vi, g. A section ¢ is parallel the direction of a vector 
on U CM iff Vxg =0 forall XE T,U,x EU. 


It is convenient to have an alternative definition of the covariant derivative 
of a section in the direction of the vector X in terms of its horizontal liftin P. 


Proposition 36.1.5 Let y be defined on U C M. Associate with g an R”- 


valued function f on m~'(U) C P by f(p) = p-'(y(a(p))) for p € 
m—'(U) C P as in Remark 34.3.1. Let X* € T,P be the horizontal lift of 
X € 7, M. Then X* f € R” and p(X* f) €n7!(x), and 


Vx = p(X*f). 


Proof Let x; be the curve with x9 = X. Let p, be the horizontal lift of x, 
such that X* = po and po = p. Then we have 


* bis <A a : 
X* f = lim -[f(n) — £()] = lim — [pj (90) — Po (9)] 


and 


._ lp 4 
pX* f= lim APP h (y(xn)) — o(x)]. 
Set € = DP, (~(xh)), and consider p;&, which is a horizontal curve in E. 
Note that p,é = g(x) and poé = PP), (v(xn))- By the definition of yi, we 
have we (pné) = pog. Hence, yg! o(xn) = PD), (Xn). Substituting this in 
the above equation yields the result we are after. 
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Sometimes it is convenient to write the definition of the covariant deriva- 
tive in terms of ordinary derivatives, as follows 


d 
Vi g= ra ((r4s))} (36.1) 
s s=0 


The covariant derivative satisfies certain important properties which are 
sometimes used to define it. We collect these properties in the following 


Proposition 36.1.6 Let X, Y € T;M and g and w be sections of E defined 
in a neighborhood of x. Then 


(a) Vx@t+W)=Vxot+Vxy¥; 

(b) Vexg=aVxq, wherea ER; 

(c) Vx+vg=Vxgt Vy@; 

(d) Vx(f¢)= f(x): Vxe+ (XS) - g(x), where f is a real-valued func- 
tion defined in a neighborhood of x. 

Proof (a) follows from the fact that the isomorphism ue is linear. (b) fol- 

low from the fact that if yx (t) is the curve whose tangent is X, then yx (at) 

is the curve whose tangent is aX. (c) follows from Proposition 36.1.5 and 

the fact that X* + Y* is the lift of X + Y. For (d), let X be tangent to x, with 

Xo =x and X = xo. Then use Eq. (36.1): 


d d 
Vx(f9) = Vio (f9) = Tol eer) = — f (x1) 79 (9) 
t t=0 dt t=0 
= # Fado yy) + nie vp (ex) 
dt : : t=0 dt . : t=0 


= f(x) Vx) + &S)- eo) 


where we used yj (f (x,)9(41)) = f(r)¥§(YQr)), which is a property of 
the linearity of yj and the fact that f(x;) is a real number. We also used 
ve = id, which should be obvious. 


Remark 36.1.2 Proposition 36.1.6 applies to vectors at a point of the man- 
ifold M. It can be generalized to vector fields by applying it pointwise. 
Therefore, the proposition holds for vector fields as well. The minor dif- 
ference is that a in (b) of the proposition can also be a function on M. 


36.2 Linear Connections 


From the general vector bundles whose standard fiber was R”, we now spe- 
cialize to the bundle of linear frames L(M) examined in Example 34.1.11, 
among whose associated bundles are tangent bundle (Box 34.1.17) and ten- 
sor bundle (Example 34.1.18). We use P for L(M) to avoid cluttering the 
notations. 
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Definition 36.2.1 A connection FP in L(M) = P is called a linear connec- linear connection 
tion of M. 


Recall that for any principal fiber bundle and its associate bundle E, each 
p € P is anisomorphism of F, the standard fiber of EF, with i (x). In fact, 
p~' is an F-valued map on ie (x). In the present case of L(M), p7! is an 
IR”-valued map on 7, M. In addition, there is a natural map TpP > T,M, 


namely z.,,. If we combine the two maps, we get a 1-form on L(M): 


Definition 36.2.2 The canonical form @ of L(M) = P is the R”-valued canonical form 
1-form on P defined by 


6(X) =p 'n,(X) forXe Ty P. (36.2) 


Proposition 36.2.3 The canonical form is a tensorial 1\-form of type 
(id, IR"), where id is the identity representation of GL(n; R). 


Proof Let X be any vector at p € P and g € GL(n; R). Then R,,X is a 
vector at pg € P. Therefore, 


(R20) (X) =O(RexX) = (pg) | (t(RexX)) 
=g°'p'(.(X)) =! -0(X), 


where we used 71,.(RexX) = 24(X) which is implied by z(pg) = m(p). This 
shows that 8 is pseudotensorial. But if X is vertical, then z,—and therefore 
6—annihilates it. Hence, @ is tensorial. 


Definition 36.2.4 Let I be a linear connection of M. For each € € R” and standard horizontal 
p € P define the vector field B(&) in such a way that (B(&))p is the unique vector field of a 
horizontal lift of p§ € Tz(p)M. The vector field B(&) so defined is called connection 

the standard horizontal vector field of IT corresponding to &. 


Proposition 36.2.5 The standard horizontal vector fields have the follow- 
ing properties: 
(a) Jf6@ is the canonical form of P, then @ o B= idp». 


(b) Rex (B(E)) = B(g7'&) for g € GL(n; R) and § € R". 
(c) If& 40, then B(E) never vanishes. 


Proof (a) follows directly from the definition of 8. In fact 


6» ((BE)),) =p | (xs((BEE)),)) = Pp '(p§) = 6. 


(b) If B(&) is a standard horizontal vector field at p, then R,,(B(é)) 
is a standard horizontal vector field at pg. Let Ry, (B(Eé)) = B(é’). Then 
1((BE")) pg) = pgé’. We also have x, ((B(E)) p) = p&. Since 14 (RgX) = 
m(X), we must have pgé’ = pé or €’= g7!é. 

(c) Assume that (B(é)), = 0 for some p. Then p& = z,.((B(é)) p) = 0. 
Multiplying by p~!, we get € =0. 
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Proposition 36.2.6 Let A* be the fundamental vector field corresponding 
to A € g and B(&) the standard horizontal vector field corresponding to 
& € R”. Then 


[A*, Bé)] = BOA). 
Proof Recall that the commutator of two vector fields is the Lie derivative 
of one with respect to the other. Hence, using Definition 28.4.12, noting that 


the action of G on P is a right action, and using (b) of Proposition 36.2.5, 
we have 


1 
[4*, BE)] = lim -[R,. BE) — BE] 
1 4 
= lim -[B(g,§) — B)] = lim -[B(e"*s) — B®] 


_ ol 
mip 7 [Ba +tA+---)&) — Bé)] = BAS), 


where we used exp(tA) = e'A when the Lie algebra is gl(n; R). 


Definition 36.2.7 The torsion form of a linear connection @ is de- 
fined by 


torsion form 


@= D6. (36.3) 


Proposition 36.2.3 implies that @ € A?(P, R”), i.e., that the torsion form 
is a tensorial 2-form. 


Theorem 36.2.8 Let w, ©, and Q be the connection form, the torsion form, 


and the curvature form of a linear connection Y of L(M). Then we have, 
First structure equation: @ = d@ + w/Q, or in detail, 


@(X, Y) = d0(X, Y) + w(X) -6(Y) — w(Y) -0(X) 
Second structure equation: Q = dw + slo, @], or 
1 
Q(X, Y) = dw(X, Y) + 51e), w(Y)], 
where X,Y € T,(L(M)). 


Proof The second structure equation is the result of Theorem 34.3.6. The 
first structure equation is derived in [Koba 63, pp. 120-121]. 


The equations above which do not act on vector fields are to be inter- 
preted as products of matrices whose entries are forms and the multiplica- 
tion is through wedge product. We can write the equations above in terms 
of components. Let {@;}”_, be the standard basis of R” and {E/ Ye jay bea 
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basis of gl(m, R). E} is ann X n matrix with a | at the ith column and jth 
row and zero everywhere else. In terms of these basis vectors, we can write 


0-06, O=0'6, 
a a3 (36.4) 
w= 0 E}, Q = QE}, 


with summation over repeated indices in place. Then the structure equations 
become 
fou = d6' +i A@i, i=1,2,...,n, 
, (36.5) 


2 = doi + a Aoi, Bh 2 ang thy 


as the reader can verify (see Problem 36.2). Multiplying both sides of the 
second equation above by E/ , the left-hand side becomes a matrix with 
elements 2". The first terms on the right-hand side becomes the exterior 
derivative of a matrix whose elements are w, and the second term on the 
right-hand side will be simply the matrix product of the latter matrix, where 
the elements are wedge-multiplied. We summarize this in the following box, 
which we shall use later: 


Box 36.2.9 Let Q be the matrix with elements i and @ the matrix 
with elements w!,. Then Q =d@ +O A @, where d operates on the 
elements of @ and @ N@ is the multiplication of two matrices in which 
ordinary product is replaced with wedge product. 


Theorem 36.2.10 (Bianchi’s identities) Let@,@, and Q be the connection 
form, the torsion form, and the curvature form of a linear connection T. of 
L(M). Then 


First Bianchi identity: D°?@ =Q/A80. 
Second Bianchi identity: D°Q° = 0. 


Proof The first identity is a special case of Theorem 35.1.4. The second 
identity was the content of Theorem 34.3.7. 


36.2.1 Covariant Derivative of Tensor Fields 


Up to this point, we have concentrated on the differentiation of forms, whose 
natural differential is D®. We also need to differentiate general tensors in 
the most “natural” way. As discussed earlier, this natural way is the di- 
rectional derivative introduced in Proposition 36.1.6. However, instead of 
derivatives with respect to a vector at a point, we generalize to derivatives 
in the direction of a vector field as pointed out in Remark 36.1.2. When the 
standard fiber is R”, with n = dim M, the associated bundle is the tangent 
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Any derivative satisfying 
(a)-(d) of 

Proposition 36.2.11 is the 
covariant derivative with 
respect to a linear 
connection. 


covariant differential 


bundle 7 (M) whose cross sections are vector fields. We thus restate Propo- 
sition 36.1.6 for L(M) with the associated bundle T(M) in the direction of 
vector fields: 


Proposition 36.2.11 Let X, Y, and Z be vector fields on M. Then 


(a) Vx(Y+ Z) =VxyY+ VxZ; 

(b) VysyZ=VyxZ4+ VyZ; 

(c) VexY=f-VxY, where fe C°(M); 

(dq) Vx(fY)=f-VxY¥+(Xf)-Y, where fe eC?(M). 


We defined the covariant derivative in terms of parallel displacements 
along a path in M and obtained the four equations of Proposition 36.2.11. It 
turns out that 


Theorem 36.2.12 Any derivative which satisfies the four conditions of 
Proposition 36.2.11 is the covariant derivative with respect to some linear 
connection. 


If instead of IR”, we take J{(IR”) as the standard basis, the bundle asso- 
ciated with L(M) becomes the tensor bundle 7" (M) of type (7, 5) over M. 
Being still a vector bundle, we can define a covariant derivative for it. Now, 
tensors are products of vector fields and 1-forms, and if we know how the 
directional derivative acts on vector fields and one forms, we know how it 
acts on all tensors. Since a 1-form pairs up with a vector field to produce a 
function, we can also state that if the action of the derivative is known for 
vector fields and functions, it is known for all tensors. The action of Vx on 
vector fields is given by Proposition 36.2.11. The proposition also includes 
its action on functions (see Problem 36.3). 


Theorem 36.2.13 Let J(M) be the algebra of tensor fields (the vector 
space of tensor fields together with tensor multiplication as the algebra 
multiplication) on M. Then the covariant differentiation has the following 
properties: 

(a) Vx :I(M) — T(M) is a type-preserving derivation; 

(b) Vx commutes with contraction, 

(c) Vy f =Xf for every function f € C°(M) on M; 

(d)  Vx4y =Vx+Vy 

(ec) VexT=f-VxT forall f €C°(M) andTe T(M). 


A tensor field T of type (7, s) can be thought of as a multilinear mapping 
T: T(M) x T(M) x--- x T(M) > T5(M). 
——— 
s times Cartesian product 


The s vector fields from the domain of T fill all the covariant slots, leaving 
all the r contravariant slots untouched. With this kind of interpretation, we 
have the following: 


'See Definition 3.4.1. 
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Definition 36.2.14 Given a tensor field T of type (r, s) the covariant 
differential VT of T is a tensor of type (7, s + 1) given by 


(VT)(X1,..., Xs; X) = (VxT)(X1,..., Xs), X;,XeT(M). 


By a procedure similar to that which led to the Lie derivative (28.36) of 
a p-form, we can obtain the following formula: 


(VT)(X1, pees Xs; X) 
=Vx(T(X1,...,Xs)) — SOT... VxXi,.... Xs), (36.6) 
al 


where T is a tensor field of type (r,s) and X;,X € T(M). 

A tensor field T, as a section of a bundle associated with L(M), is said to 
be parallel iff VyT = 0 for all vector fields on M (see Proposition 36.1.3). 
This leads to 


Proposition 36.2.15 A tensor field T on M is parallel iff VT = 0. 


36.2.2 From Forms on P to Tensor Fields on M 


We have defined two kinds of covariant differentiation: D®, which acts on 
differential forms on P, and V, which acts on the sections of the associ- 
ated bundle FE. In general, there is no natural relation between the two, 
because the standard fiber R” of E has no relation to the structure of P. 
However, when P = L(M) and the standard fiber is IR”, the fibers re (x) 
become 7; M, the tangent spaces of the base manifold of the bundle, which 
are reachable by z,. Therefore, we expect some kind of a relationship be- 
tween the two covariant derivatives. In particular, the two quantities defined 
in terms of D®, namely the torsion and curvature forms, should be related 
to quantities defined in terms of V. 

The torsion form, being an R”-valued 2-form on P = L(M), takes two 
vector fields on P and produces a vector in R”, the standard fiber of E. 
Then, through the action of p € P, this vector can be mapped to a vector 
in T, M with x = m(p) as in Theorem 34.1.14. This process, in conjunction 
with Remark 34.3.1 allows us to define a map T: T(M) x T(M) > T(M). 
This map is called torsion tensor field or just torsion, and is defined as torsion 
follows: 


T(X, Y) = p(@(X*, Y*))  forX, Ye 7M, (36.7) 


where p is any point of L(M) such that 7(p) = x, and X* and Y* are 
any two vectors of L(M) such that 2,(X*) = X and z,(Y*) = Y. Re- 
mark 34.3.1 ensures that T(X, Y) is independent of p, X*, and Y*. Fur- 
thermore, T(X, Y) = —T(Y, X), and since it maps T(M) x T(M) to T(M), 
it is a skew-symmetric tensor field of type (1, 2). 


1125 


1126 


curvature 


curvature 
transformation 


36 Differential Geometry 


Similarly, The curvature form, being a gl(n, R)-valued 2-form on P = 
L(M), takes two vector fields on P and produces a matrix in gl(n, R). 
This matrix can act on a vector in R”, which can be the inverse map of 
Theorem 34.1.14 (i.e., the image of a vector in 7,M by p7'), to produce 
another vector in R”. Then, through the action of p € P, this vector can 
be mapped to a vector in 7,M with x = z(p) as in Theorem 34.1.14. 
This process, in conjunction with Remark 34.3.1 allows us to define a map 
T(M) x T(M) x T(M) —> T(M). This map is called curvature tensor field 
or just curvature, and is defined as follows: 


R(X, Y)Z = p(Q(X*, Y*)(p'Z)) forX, Y,Ze 7M. (36.8) 


It follows that R is a tensor field of type (1,3) with the property that 
R(X, Y) = —R(Y, X). Note that R(X, Y) is an endomorphism of 7; M, and 
is called the curvature transformation of 7; M determined by X and Y. 


Theorem 36.2.16 The torsion T and curvature R can be expressed in 
terms of covariant differentiation as follows: 


T(X, Y) = Vx¥ — VyX — [X, Y] 
R(X, Y)Z=[Vx, Vy]Z— V,x,y1Z, 


where X, Y, and Z are vector fields on M. 


Proof Specialize Proposition 36.1.5 to L(M) and get (Vx Y), = P(X Sf) 
where x, is the horizontal lift of X at x. The function f is given by 


f(p) =p" (Xx) = p | (4(¥})) = O(¥5), (36.9) 
where xX, is the horizontal lift of Y at x. Thus, we have the identity 
(Vx Y)x = p(X7,(0(Y%))). (36.10) 


From (36.9), we get p7'(VxY)x — A((VxY);). From (36.10), we get 
p7'(VxY)x = x, (0 (Y7,)). Therefore, we have another useful identity: 


6((Vx¥)*),, = X7,(0(¥"*)). (36.11) 


Now use the first structure equation of Theorem 36.2.8 to get @(X*, Y*) = 
d6(X*, Y*) because w(X*) = 0 = w(Y*) for horizontal X* and Y*. We 
therefore have 


T(Xx, Yx) = p(O(X;, Y;,)) = p(d0(X;,, Y;)) 


= p(X (0(¥*)) ~ ¥5 (0(X*)) — 0([X*, ¥*],)) 


= (VxY)x — (VyX)x — [X, ¥]x, 
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where we used Theorem 28.5.11 and the fact that z,.({[X*, Y*]) = [X, Y]. 
To prove the curvature tensor equation, let p~'Z = f(p) = 0(Z;,) by 
(36.9). Then, we have 


R(Xx, ¥x)Zx = p(Q(X),, ¥,)(f(P))) = p(—@([X*, ¥"],)(f@))) 
= p(-e(v[X*, ¥*],)(£(0))) = p(-A(F))), 86.12) 


where we used Eq. (34.12) and the fact that w annihilates the horizontal 
component of the bracket (v stands for the vertical component). In the last 
step, we used (a) of Definition 34.2.1 and denoted by A the element of the 
Lie algebra that gives rise to Ay, = v[X*, Y*]». Now we note that 


d d 


—A(f(P)) = = exe(-AD f(p)] = Ff (pexp(Ad)| = Ans 
t=0 t=0 


=v[X*, ¥*] f =X*(¥5f) —¥*(X5,f) —h[X*,Y"] Sf 
= X"(¥7,(0(Z*))) — ¥"(X,,(0(Z*))) — A[X*, Y"], (@(Z*)). 
It now follows that 
R(Xx, Yx)Zx 
= P(X*(¥;,(0(Z*))) — ¥*(, @(Z*))) — AX", ¥"], (0(Z"))) 
= P(X, (0((VrZ)*)) — ¥,,(0((xZ)*)) — h[X*, Y"], (0(Z*))) 


=VxVyZ— Vy VxZ— Vix.y1Z, 


where we used (36.11) to go from the first line to the second. 


We also want to express the Bianchi’s identities in terms of tensor fields. 
We shall confine ourselves to the case where the torsion form is zero. In 
most physical applications this is indeed the case. So the first identity of 
Theorem 36.2.10 becomes 0 = QA@. Now let X*, Y*, and Z* be the lifts of 
X, Y, and Z. Then, it is easily shown that 


0 = (Q40)(X*, Y*, Z*) = Cyc(Q(X*, Y*)0(Z*)), 


where Cyc means taking the sum of the cyclic permutations of the expres- 
sion in parentheses. From Eq. (36.12) and the discussion before it, we also 
see that 


R(X, Y)Z = p(Q(X*, Y*)(0 (Z*))). (36.13) 
Putting the last two equations together, we obtain 
Cyc(R(X, Y)Z) =R(X, Y)Z4+ R(Z, YY+ R(Y, Z)X=0. (36.14) 


The proof of the second Bianchi’s identity is outlined in Problem 36.5. 


Theorem 36.2.17 Let R be the curvature of a linear connection of M whose 
torsion is zero. Then for X, Y and Z in T(M), we have 
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Bianchi’s Ist identity: Cyc[R(X, Y)Z] = 0; 
Bianchi’s 2nd identity: Cyc[VxR(Y, Z)] = 0. 


For the sake of completeness, we give the Bianchi’s identities for the case 
where torsion is not zero and refer the reader to [Koba 63, pp. 135-136] for 
a proof. 


Theorem 36.2.18 Let R and T be, respectively, the curvature and tor- 
sion of a linear connection of M. Then for X, Y and @ in T(M), we 
have 


Bianchi’s Ist identity: Cyc[R(X, Y)Z] = Cyc[T(1(X, Y), Z) + 


(VxT)(Y, Z)]; 
Bianchi’s 2nd identity: Cyc[VxR(Y, Z) + R(T(X, Y), Z)] =0. 


36.2.3 Component Expressions 


Any application of connections and curvatures requires expressing them in 
local coordinates. So, it is useful to have the components and identities in 
which they participate in terms of local coordinates. For a linear bundle, the 
isomorphism 27! (U) =U x GL(n, R) suggests a local coordinate system 
of the form (x;, Xj), where (x1,...,X) is a coordinate system for U C M 
and X d are the elements of a nonsingular n x n matrix. 

Let’s start with the canonical 1-form @ = )~?_, 6'@;, with é; the standard 
basis of R”. The most general coordinate expression for 6! is 


d= adx! + bak therefore 6 = (aidx! + bi dX) 6. 


By definition, 9 = p~! oz, and the presence of z,, annihilates any vertical 
component of a vector field. This means that b = 0. Now note that as a map 


R” — T,M, p, whose local coordinates are (x!, xd), sends €; to x dj, and 
p~' sends d to Y,€;, where Y; is the inverse of Xx’. Hence, 6(d) = Y;.@. 
So, we have 


¥(€; =O (dp) = (aiidx!@;) (I) = aldx4 (a8; = a'.8/8; = aj e;. 
Thus, we have our first result: 
gi=Yidxt, yi=(x-).. (36.15) 


Next, we consider the connection 1-form, w = wi E} , where wi, are real- 


valued 1-forms and {E/ } form a basis of gl(n, R). Write 


i ayk i 4 k 
OF = ad X; + bi,dx 
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and let it act on a fundamental vector field AL= = 4 a7 (P exp At) |:<0- If p is 
represented by X p> then (with the notation af = =0 " dX%) 


and when ow, acts on At. it should give Ai. So, 
‘ su HE 
Al = 0}(A%) = apd X4 (XAG a2) + bi.dx* (XG AP az) 
= a, XG ABAXk (92) =a, XG ABINSY =a, XRA. 
For the equality to hold for all A‘ , we must have a,X' k — §! B ie., that oe 
yi yj» where y is the inverse of X;, i . SO, we write 


wi, = YydX4 + bidx*. (36.16) 


To get more insight into the composition of b/. ik? let us try to find V9, 0; 
using (36.10). To this end, let xj be the horizontal lift of 0;. The general 
form of xj i is 


* yi a, B ay 
X= Aid; + AX as. 
Since m(X%) = 4; and 14(0%) = 0, we get 2/, = 4/. Since w(X*) = 0 = 
wh, (Xi ), we get 


Om 
0=oi,(X*) = Yjdxk (A, - + bi ,dx*(a;) 

=¥iAl 55m +B) ,d4 = Yi AX, +i, . 
Multiply both sides by X* (and sum over i, of course) to get 


O= X%vi AK +b Xt = At, + bi xX? or At, =—bi xX? 


jm mj jm mj jm mj 
and 
© = dj - Dr XE an. (36.17) 
With a similar expression for X}. To use (36.10), we need 6(X*), which 


we can obtain using Eq. (36.15), noting that @ acts only on the horizontal 
component of X*: 


0(X*) = VP dx* (a;)eg = YP okey = VP Gp. 


As we apply (36.17) to this expression, we must keep in mind that a , being 
the inverse of X A , is independent of x/. Therefore, only the second term of 
(36.17) acts on Y?. Then 


Xx; (6(X;)) -_ —bi, jXK Oy (ves) aa —bF Xt (07 vf ye €g 


ye ex. 


_ =, jx (V/" YP es = _ Bhai i 


mj 
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Finally, we apply p on this and use pé, = X f 0; to obtain 


Va; 01 = Dy YX, = V5 9, (36.18) 


mri 


where in the last step, we introduced the Riemann-Christoffel symbols.” 
These symbols make sense, because Va j 0; is a vector field, which could be 
expanded in terms of the basis {0;}. The Riemann-Christoffel symbols are 
simply the coefficients of the expansion. In terms of these symbols, bi , can 


be written as bi, = iy y x7 and Eq. (36.16) can be expressed as 


wi, = Yi (dX) +0, X7dx'). (36.19) 


The Riemann-Christoffel symbols could also be obtained from the con- 
nection form. From the 1-form w on P, we define a local gl(n, R)-valued 
1-form wy on M as follows. At each point x on M let o be the section 
sending x to the linear frame (01,...,0,). A general section sends x to 
(X} a1, ..., X70,) or equivalently, it sends x to the point in P with coor- 


dinates (x', X i). The particular section we are considering sends x to the 


point in P with coordinates (x!, 5 ). Moreover, we define wy = o0*w. Then 
wy is obtained from Eg. (36.19) by setting X’ = 6”, Yj =61, and dx’ =0. 
Therefore, , , 


(ov); = Tax! : (36.20) 


We often omit the subscript U when there is no risk of confusion. 


Historical Notes 

Elwin Bruno Christoffel (1829-1900) came from a family in the cloth trade. He attended 
an elementary school in Montjoie (which was renamed Monschau in 1918) but then spent 
a number of years being tutored at home in languages, mathematics, and classics. He 
attended secondary schools from 1844 until 1849. At first he studied at the Jesuit gym- 
nasium in Cologne but moved to the Friedrich-Wilhelms Gymnasium in the same town 
for at least the three final years of his school education. He was awarded the final school 
certificate with a distinction in 1849. The next year he went to the University of Berlin 
and studied under a number of distinguished mathematicians, including Dirichlet. 

After one year of military service in the Guards Artillery Brigade, he returned to Berlin 
to study for his doctorate, which was awarded in 1856 with a dissertation on the mo- 
tion of electricity in homogeneous bodies. His examiners included mathematicians and 
physicists, Kummer being one of the mathematics examiners. 

At this point Christoffel spent three years outside the academic world. He returned to 
Montjoie, where his mother was in poor health, but read widely from the works of Dirich- 
let, Riemann, and Cauchy. It has been suggested that this period of academic isolation 
had a major effect on his personality and on his independent approach towards mathemat- 
ics. It was during this time that he published his first two papers on numerical integration, 
in 1858, in which he generalized Gauss’s method of quadrature and expressed the poly- 
nomials that are involved as a determinant. This is now called Christoffel’s theorem. 

In 1859 Christoffel took the qualifying examination to become a university teacher and 
was appointed a lecturer at the University of Berlin. Four years later, he was appointed to 
a chair at the Polytechnicum in Zurich, filling the post left vacant when Dedekind went 


2The reader is cautioned about the order of the lower indices, as it is different in different 
books. 
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to Brunswick. Christoffel was to have a huge influence on mathematics at the Polytech- 
nicum, setting up an institute for mathematics and the natural sciences there. 

In 1868 Christoffel was offered the chair of mathematics at the Gewerbsakademie in 
Berlin, which is now the University of Technology of Berlin. However, after three years 
at the Gewerbsakademie in Berlin, Christoffel moved to the University of Strasbourg as 
the chair of mathematics, a post he held until he was forced to retire due to ill health in 
1892. 

Some of Christoffel’s early work was on conformal mappings of a simply connected re- 
gion bounded by polygons onto a circle. He also wrote important papers that contributed 
to the development of the tensor calculus of Gregorio Ricci-Curbastro and Tullio Levi- 
Civita. The Christoffel symbols that he introduced are fundamental in the study of tensor 
analysis. The Christoffel reduction theorem, so named by Klein, solves the local equiva- 
lence problem for two quadratic differential forms. The procedure Christoffel employed 
in his solution of the equivalence problem is what Ricci later called covariant differen- 
tiation; Christoffel also used the latter concept to define the basic Riemann—Christoffel 
curvature tensor. His approach allowed Ricci and Levi—Civita to develop a coordinate- 
free differential calculus which Einstein, with the help of Grossmann, turned into the 
tensor analysis, the mathematical foundation of general relativity. 


The Riemann-Christoffel symbols are functions of local coordinates. 
A change of coordinates transforms the symbols according to a rule that 
can be easily worked out. In fact, if x® is the new coordinates and Tay is 
the symbols in the new coordinate system, then V5 F ay = Tey dy. To find 


3 in terms of the old symbols, substitute (Ogx! yd; for dws etc., 


axd axk 
V axi a; a) =T, —— o. 
xB 9 \ OXY Ox 


¥ 


then use Proposition 36.2.11 to expand the left-hand side. After some simple 
manipulation, which we leave for the reader, we obtain 
ax/ ax* ax pe’ oa" 


Ty, =r 36.21 
By =" ikaxB axv Ox! | OxPORY Ox! ( ) 


Because of the presence of the second term, Riemann-Christoffel symbols 
are not components of a tensor. 

From the definition of the Riemann-Christoffel symbols, the components 
of a connection form, we deduced their transformation properties. It turns 
out that 


Theorem 36.2.19 A set of functions Migs which transform according to 
Eq. (36.21) under a change of coordinates, define a unique connection 
whose components with respect to the coordinates {x'} are r y- Further- 


more, the connection form ® = w' Ei is given in terms of the local coordi- 
nate system by 


wi, = Yi (dX +07, X"dx') 


Proof See [Koba 63, pp. 142-143]. 


Define the components of torsion and curvature tensors by 


T(3;,0j) = Ta, R(Bj, 9j)e = Rijs A- (36.22) 
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Then, using Theorem 36.2.16 one can easily obtain the following 
Box 36.2.20 The components of torsion and curvature tensors are 
given in terms of the Christoffel symbols: 
k k k 
eta (36.23) 
Rig = HT}, — Vy +R im — TET in 


In particular, ry, = ri, if the torsion tensor vanishes. 


Equation (36.20) pulls down the connection 1-form from P to M. One 
can pull down the curvature 2-form as well. Then Eq. (36.5) gives 


(2); =d(wu)', + (ou), A (@u)}- (36.24) 
As indicated above, we often omit the subscript U. Problem 36.11 shows 
that 
i_ lai k l 
25 = 5 Riad A dx (36.25) 


where Rig is as given in Eq. (36.23). 


36.2.4 General Basis 


The coordinate expressions derived above express the components of forms 
and fields in a coordinate basis. We need not confine ourselves to coordinate 
bases. In fact, they are not always the most convenient bases to use. We can 
work in a basis {e;} and its dual {e/}. Then the Riemann-Christofel symbols 
are defined as 


Vee; = exI",;. (36.26) 
It has to be emphasized that, even in the absence of torsion, in a general 


basis, re ; zr Ae Only in a coordinate basis does this symmetry hold. 


Example 36.2.21 Consider two bases {e;} and {e;/}. Write the primed basis 
in terms of the other: e;, = RY ie}. Then 


= R',{R! Vey (ej) + Ve; (R! ej} 
= Ri ypR? Om 0; + Rhy r(R™ y) em- 
—_SE>—— 
SR" 


Connection coefficients Writing ey = R™ Vem on the LHS, equating the components on both sides, 
are not tensors! and multiplying both sides by the inverse of the transformation matrix R, we 
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obtain 
k’ _ pk l pi m kr 1 pm 
Pe ga Re Ry Ly ERR eR yas (36.27) 
ee 
how a (1, 2)-tensor nontensorial 
transforms term 


where Re = CRE Vinee Equation (36.27) shows once again that the con- 
nection coefficients are not tensors. Equation (36.21) is a special case of 
(36.27). 


Applying Ve; to both sides of é4 = = (€m,€/), we obtain 
Vee! =—-I,€*. (36.28) 


Since an arbitrary tensor of a given kind can be expressed as a linear com- 
bination of tensor product of vectors and 1-forms, Eqs. (36.26) and (36.28), 
plus the assumed derivation property of Vu, is sufficient to uniquely define 
the action of Vy on any tensor and for any vector field u. 

Let T be a tensor of type (7,5). The covariant differential of Defini- 
tion 36.2.14, V: T/(M) = T. y+1(M!), which is sometimes called the gen- 
eralized gradient operator, when acting on T, adds an extra lower index. In 
what follows, we want to find the components of VT in terms of the compo- 
nents of T. If 


THT" e;, @--- Be, OE" @--- Bek, 


then, following the customary practice of putting the extra lower index after 
a semicolon, we write 


VTS Ti (0, @- BG, GEN®-- BE* BE, (36.29) 


and, with u = u* 


ex, 
Vat = Tit puter, @ ++ Be;, BE @--- Ber, (36.30) 


Using these relations, we can calculate the components of the covariant 
derivative of a general tensor. It is clear that if we use e; instead of u, we 
obtain the kth component of the covariant derivative. So, on the one hand, 
we have 


Ve,T = ae Be Fi, OBE, GBENQ®-.- Wer, (36.31) 
and on the other hand, 
Veg = Ve (Tie), @ ++ Be}, GE @--- Be*) 
= Te, ®--- Be, ® E11 @---@er 


+7) wi eee @ Verein ®-.-€;, BEB Be! 


m=1 


n 
en Py, kim 
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+73 4 Se, -@e;, Qe @---® Ve,e™ ++ QE, 
es 

Tre" 

by (36.28) 


m=1 


(36.32) 


where ae Pe e (Te ). Equating the components of Eqs. (36.31) and 
components of the (36.32) yields 
covariant derivative of a 
tensor ; 
‘ea = ae os : ale yori pile by [Rl pls ieee 


«-Jm—1 jm Jm41 
m=1 


7 pilin adel 1 (36.33) 


J+ Jm—1Im+1 ++ Kim? 
m=1 


where only the sum over the subindex m has been explicitly displayed; 
the (hidden) sum over repeated indices is, as always, understood. Equa- 
tion (36.33) is (36.6) written in terms of components. 

If u happens to be tangent to a curve f > y(t), Eq. (36.30) is written as 


1...d; 
VuT = — ae ej, @-- Oe, GE @--- @e*, (36.34) 
where Dr a ,/at= ae ae a: In a coordinate frame, with u! = x! = 
dx’ /dt, Eqs (36.30) and (36.33) give 


aa k od k 
DT, -_ iy..iy dx aT; qT. i, -lm—1lm41+ im dx 
ee pt $ wee 

dt Jie-JIso® dt + dim— oraen “dt. 

m=1 
s k 
ar dx 
_ 1 ---Lm—1!mlm+1-- n 
se Ti -jm—-1Njm+1-- kim “dt (36.35) 
m=1 


For the case of a vector (36.35) becomes 


= 4 vit (36.36) 


This is an important equation, to which we shall return shortly. 

With the generalized gradient operator defined, we can construct the di- 
vergence of a tensor just as in vector analysis. Given a vector, the divergence 
operator V- acts on it and gives a scalar, or, in the language of tensor anal- 
ysis, it lowers the upper indices by 1. This takes place by differentiating 
components and contracting the upper index with the newly introduced in- 


divergence ofatensor dex of differentiation. The divergence of an arbitrary tensor is defined in 
field precisely the same way: 
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Definition 36.2.22 Given a tensor field T, define its divergence V - T 
to be the tensor obtained from VT by contracting the last upper index 
with the covariant derivative index. In components, 


(ene — eptiileath als 
es ee 


Example 36.2.23 There is a useful relation between the covariant and the 
Lie derivative that we derive now. First, let T be of type (2, 0) and write it in 
some frame as T= T'/e; @e j- Apply the covariant derivative with respect 
to u to both sides to obtain 


VuT = u(T" Je; @ ej + T! (Vue;) @ ej + Te; @ (Vue). 


Similarly, derivation of the relation 


between the Lie and the 


LuT= u(T” e; @eji+ ry (Luei) ® ej + Te; @ (Lue;). covariant derivatives 


Now use Lye; = [u, e;] = Vue; — Ve;u to get 
LyT = VuT — T'/[(Ve,u) ® €j + € ® (Ve, w)]. (36.37) 


On the other hand, if we apply Vu and Ly to both sides of 5 — (ée! , ej) 
and use [u, e;] = Vue; — Ve, U, we obtain , 


Vue! = Lye! _ (Ve,u)'e!. 
It follows that for T = T;;€! ® e/, we have 
LuT = VuT + Tij[(Ve,w)'e* @ €/ + (Ve,u)/e! @E*]. (36.38) 
One can use Eqs. (36.37) and (36.38) to generalize to a tensor of type (7, 5). 


Example 36.2.24 Let f be a function on M. Then Vf is a one form. Call 
it @. Ina local coordinate system, it can be written as ¢ = ¢;dx', where 


0 
bi = O(0)) = VF (Oi) = Va, f = Of = 


Noting that (Vf); = fi, Vf = fiidx', and (V2 f)ij = f.ij, let us first find 
the covariant derivative of ¢. V@ is a 2-form, whose components can be 
found as follows: 


(V¢)ij = VO, 9j) = Va, ($(9;)) — 6(Va,9;) 
= 9; (Gi) — O(0 ja) = 9; (8: f) — Pb 


_ a f af rk. of 
axJax! axJ ox! Y axk? 


= rm, OK f — 
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where we used Eq. (36.6). We rewrite this as 


2 
fii = a a rm, us (36.39) 
Reversing the order of indices, we get 
fix ay as (36.40) 
 axtaxd JT axk 
Subtracting (36.39) from (36.40) and using (36.23), we obtain 
fai — fi = (Tj - ne = Tijd f. 6.41) 


Thus, only if the torsion tensor vanishes are the mixed “partial” covariant 
derivatives equal. 

Now we want to find the difference between the mixed second “partial” 
covariant derivatives of a vector field Z = &*d,. It is more instructive to use 
general vectors and then specialize to coordinate vector fields. We are thus 
interested in V7Z(Y, X) — V7Z(X, Y). Let yy = VZ. From Eq. (36.6), we 
have 


Vw(X, Y) = Vy ((X)) — W(VyX) 
= Vy (VZ(X)) — VZ(VyX) 
= Vy VxZ— VvyyxZ. 


Switching X and Y and subtracting, we get 


V7Z(Y, X) — V°Z(X, Y) = Vx VyZ— Vy VxZ+ Vv,)xZ— VoyvZ 
=([Vx, Vy]Z— Vvyy—vyxZ 


=([Vyx, Vy]Z— Vax, y)4[x,y]Z, 
where we used Theorem 36.2.16. We thus have 
V°Z(Y, X) — V°Z(X, Y) =[Vx, Vy]Z— VixyjZ— Vax,yZ, 
or, using Theorem 36.2.16 again, 
V7Z(Y, X) — V7Z(X, Y) = R(X, Y)Z 4+ Viy,.x)Z. (36.42) 
Substituting 0; and 0; for Y and X in the equation above, we can get 
hie — Glen = Rigi! + THE). (36.43) 


We leave this as an exercise for the reader. 


36.3. Geodesics 
36.3 Geodesics 


Let y be acurve in M. Denote y(t) by x;, so that y (0) = x9 € M. Let y; be 
the parallel displacement along the curve y in M from T;,(M) to T,,(M). 
In particular, consider 4 , the parallel displacement from 7;,(M) to Ty,(M) 
along y. It is natural to associate the zero vector in T;,(M) with x,° Ast 
varies, the zero vector also varies, and by the parallel displacement > one 
can monitor how the image of x; “develops” in 7,,(M). 


Definition 36.3.1 The development of the curve y in M into T,, (M) is the 
curve yj (xr) in Tx) (M). 


The following theorem, whose proof can be found in [Koba 63, pp 131- 
132], relates the tangent to the development of a curve and the parallel dis- 
placement of its tangent. 


Theorem 36.3.2 Let y be a curve in M and Y; = % (x;). Let Cy = V4 (x;) 
be the development of y in M into T,,(M). Then 


aa 
dt 

This theorem states that the tangent to the development of a curve is the 
same as the parallel displacement of the tangent to the curve. In other words, 
% “develops” not only the curve, but its tangent at every point of the curve. 

An interesting consequence of this theorem is that if x; is parallel 
along y, then Y; is independent of f, i.e., Y; is constant, say Y; = a. Then, 
C,; = at +b. Hence, we have 


Corollary 36.3.3 The development of y in M into T,,.(M) is a 
straight line iff x; is parallel along y . 


Curves in manifolds with a given linear connection bend for two reasons: 
one is because the curve itself goes back and forth in the manifold; the other 
is the inherent bending of the manifold itself. The straightest possible lines 
in a manifold are those which bend only because of the inherent bending 
of the manifold. Given any curve, we can gauge its bending by parallel dis- 
placement of vector fields along that curve. If the vector field has a vanishing 
covariant derivative, it is said to be parallel along the curve. However, that 
says nothing about how “curvy” the curve itself is. 

To get further insight, let’s look at the familiar flat space. In the flat space 
of a large sheet of paper, construction of a straight line in a given direction 
starting at a given point Pp is done by laying down the end of a vector 
(a straight edge) at Po pointing in the given direction, connecting Pp to a 


3This association becomes plausible if one specializes to two dimensions and notes that 
the plane 7;, (7) touches M at x;, the natural origin of the plane. 
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neighboring point P; along the vector, moving the vector parallel to itself 
to P;}, connecting P; to a neighboring point P2, and continuing the process. 
In the language of the machinery of the covariant derivative, we might say 
that a straight line is constructed by transporting the tangent vector parallel 
to itself. 


Definition 36.3.4 Let M be a manifold and y a curve in M. Then y is 
called a geodesic of M if the tangent vector x, at every point of y is parallel 
displaced along the curve: V;,x; = 0. 


Since the definition is in terms of the parameter t, the parametrization 
of the curve becomes important. Such a parameter, if it exists is called an 
affine parameter. 

It follows from Eq. (36.36)—with v* = u* = dx* /dt—that a geodesic 
curve Satisfies the following DE: 


d2xk k dx! dx/ 


| Rear =0. 36.44 
dt? Ty dt dt ( ) 


This second-order DE, called the geodesic equation, will have a unique 
solution if x’ (0) and x' (0), i.e., the initial point and the initial direction, are 
given. Thus, 


Theorem 36.3.5 Through a given point and in a given direction 
Passes only one geodesic curve. 


If s(t) is another parametrization of the geodesic curve, then a simple 
calculation shows that 


dx’ pas dx/ ax k ax del) 409 AX” 5 
= —_ i = . t ——s (t). 
ae + i de dt (3 a tt ae ag Jeo) gee 


=0 


This requires s” (t) to be zero, or s =at+ B, witha, BER. 
Corollary 36.3.3 leads immediately to the following: 


Proposition 36.3.6 A curve through x € M is a geodesic iff its devel- 
opment into T,(M) is a straight line. 


36.3.1 Riemann Normal Coordinates 


Starting with a point P of an n-dimensional manifold M on which a co- 
variant derivative is defined, we can construct a unique geodesic in every 
direction, i.e., for every vector in Jp(M). By parallel transportation of the 
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tangent vectors at P, we can construct a vector field in a neighborhood of 
P: The value of the vector field at Q—assumed to be close enough to P— 
is the tangent at Q on the geodesic starting at P and passing through Q.* 
The vector field so obtained makes it possible to define an exponential map 
from the tangent space to the manifold. In fact, the integral curve exp(tX) 
of any tangent vector X in Jp(M) is simply the geodesic associated with 
the vector. 

The uniqueness of the geodesics establishes a bijection (in fact, a diffeo- 
morphism) between a neighborhood of the origin of Jp (M) and a neighbor- 
hood of P in M. This diffeomorphism can be used to assign coordinates to 
all points in the vicinity of P. Recall that a coordinate is a smooth bijection 
from M to R”. Now choose a basis for Jp(M) and associate the compo- 
nents of ¢X in this basis to the points on the geodesic exp(tX). Specifically, 
if {a' }/_, are the components of X in the chosen basis, then 


xi(t)=a't, i=1,2,...,n, 


are the so-called Riemann normal coordinates (RNCs) of points on the 
geodesic of X. The geodesic equations in these coordinates become 
koi jg k ——— 
T jja 4 =0 > 7T ere’ ij = 9. 
In particular, if the torsion vanishes, then I’ ji is symmetric in i and j. 
Hence, we have the following: , 


Proposition 36.3.7 The connection coefficients at a point P € M vanish in 
the Riemann normal coordinates at P if the torsion vanishes. 


Using Eq. (36.33), we immediately obtain the following: 


Corollary 36.3.8 Let T be a tensor field on M with components eas with 


respect to a Riemann normal coordinate system {x'} at P. Then 


Tpedp a eee 
Jie dsik axk Jie ds 


if the torsion vanishes. 


Riemann normal coordinates are very useful in establishing tensor equa- 
tions. This is because two tensors are identical if and only if their com- 
ponents are the same in any coordinate frame. Therefore, to show that two 
tensors fields are equal, we pick an arbitrary point in M, erect a set of RNCs, 
and show that the components of the tensors are equal. Since the connection 
coefficients vanish in an RNC system, and covariant derivatives are the same 
as ordinary derivatives, tensor manipulations can be simplified considerably. 


4We are assuming that through any two neighboring points one can always draw a 
geodesic. For a proof see [Koba 63, pp. 172-175]. 
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For example, the components of the curvature tensor in RNCs are 


ary, aly 


axk ax! © 


R int = (36.45) 
This is not a tensor relation—the RHS is not a tensor in a general coordinate 
system. However, if we establish a relation involving the components of the 
curvature tensor alone, then that relation will hold in all coordinates, i.e., it 
is a tensor relation. For instance, from the equation above one immediately 
obtains 


i i i 
Since this involves only a tensor, it must hold in all coordinate frames. This 


is the coordinate expression of Bianchi’s first identity of Eq. (36.14).° 


Example 36.3.9 Differentiate the second equation in (36.23) with respect 
to x” and evaluate the result in RNC to get 


i _ pi _ fi i 
R jkl:m — R jkl,m — r lj,km P kj,lm* 


From this relation and T° jim" jlmk> We obtain the coordinate expres- 


sion of Bianchi’s second identity of Theorem 36.2.17: 


R' igtem +R mist +R jtmsk =9 and Ri gm] = 0- (36.46) 


jmk; jlm; 


In Einstein’s general relativity, this identity is the analogue of Maxwell’s 


pair of homogeneous equations: Fug y + Fya,g + Fpy,a = 9. 


Using Proposition 36.3.7, we establish the following tensor identity, 
which although derived in Riemann normal coordinates, is true in general. 


Corollary 36.3.10 Let w be a differential form on M. If the torsion van- 
ishes, then 


dw = A(V@), 


where A is the antisymmetrizer introduced in Eq. (26.14). 


36.4 Problems 


36.1 Use the fact that R,,X is a vector at pg € L(M) to show that the 
canonical 1-form of L(M) is a tensorial 1-form of type (GL(n, R), R”). 


36.2 Derive the two structure equations of (36.5). Hint: For the second 
equation, use (34.13) with structure constants coming from Example 29.2.7. 


5See also Problem 36.6 for both of Bianchi’s identities. 
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36.3 Let Y be a constant vector in (d) of Proposition 36.2.11 to show that 
Vx f=Xf. 


36.4 Derive Eq. (36.6). 


36.5 In this problem, you are going to prove Bianchi’s second identity in 
terms of curvature tensor. 


(a) Show that 
D°Q(X*, Y*, Z*) = Cyc(X* (Q(¥*, Z*)) — Q([X*, Y*], Z*)). 
(b) Using arguments similar to the text, show that 
p(X*(Q(¥*, Z*))) = VxRCY, Z) 

(c) Convince yourself that 

pQ([X*, Y*], Z*) = R(z.[X*, Y*], Z). 
(d) Use 2. = p 06 and the fact that 

6[X*, Y*] = d6(X*, Y*) = @(X*, Y*) 


to show that z,[X*, Y*] = T(X, Y). 
(e) Put everything together and show that Cyc[VyRC(Y, Z)] = 0 when tor- 
sion tensor vanishes. 


36.6 Derive the coordinate expression for Bianchi’s first and second identi- 
ties of Theorem 36.2.18. 


36.7 Use bas xi = 5 to show that 


: 0 ; ; 
myi _ i ipym 
ay Y = axe y;=-YY! 


36.8 From Eq. (36.18) show that bi, a i. Ves x". Now use this result to 
rewrite Eq. (36.16) as (36.19). 


36.9 Derive Eq. (36.21). 
36.10 Prove the formulas in Eq. (36.23). 


36.11 Substituting (36.20) in (36.24) and noting the antisymmetry of the 
wedge product, derive Eq. (36.25). 


36.12 Let Z = é*d,. Show that 


v2), =e, = 5 ret 
J 5 ay jk 


and 


Ey — Eu = Rig! + Tig& 
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The differential geometry of the last chapter covered most of what is needed 
for many applications. However, it lacked the essential ingredient of a met- 
ric. Almost all spaces (manifolds) encountered in physics have a natural 
metric which is either known from the beginning, or is derived from some 
of its physical properties (general theory of relativity). In this chapter, we 
look at spaces whose connections are tied to their metrics. 


37.1 The Metric Connection 


Let P(M, G) be a principal fiber bundle. Let p be a representation of G 
into GL(m, R), and E(M, R”, G, P) the vector bundle associated with P 
with standard fiber R” on which G acts through p. A fiber metric is a map 
g:M—> TP(E) such that g, = g(x) is an inner product in the fiber te (x) 
which is differentiable in x. For all physical applications, the structure of 
the base manifold M is such that a fiber metric always exists for any vector 
bundle E associated with P(M, G). . 
; : ; ; ‘ fiber metric 

Given a connection I’ in P, we can define a parallelism on the associated 
bundle E based on which we construct an isomorphism of the fibers. If 
this isomorphism preserves the inner product, i.e., if the isomorphism is an 
isometry, we say that the connection is a metric connection. This property 
can be restated by saying that g is parallel. Hence, by Proposition 36.2.15, 
we have 


metric connection 
Theorem 37.1.1 A connection TY with covariant differential V is metric 
with respect to the metric g iff Vg = 0, i.€., 8ij:k =0 Vi, j,k. 


A statement equivalent to this theorem is that 


Box 37.1.2 The operation of raising and lowering of indices com- 
mutes with the operation of covariant differentiation. 
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Consider two vectors v and w. If the covariant derivative of g vanishes, 
then 


Vulg(v, w)] = Vu(g. v@ w) 
= (Vug, V® w) + (g, (Vuv) ® w) + (9, V@ (Vuw)) 
= 9(Vuv, Ww) + g(v, Vuw). (37.1) 


In particular, if v and w are parallel displaced along u, their inner product 
will not change. 

When there is a fiber metric in E, the group representation by which G 
acts on the standard fiber of E, can be made to take values in the group 
O(m — v, v) (see Sect. 29.2.1) so that p : G — O(m — v, v) becomes the 
new representation. So, the group associated with the vector bundle E is 
O(m — v, v) when there is a metric connection. In particular, when we deal 
with a linear connection, E = T(M) and the structure group becomes O(n — 
v,v). 


Definition 37.1.3 A Riemannian manifold is a differentiable manifold M 
with a metric g € ii (M), such that at each point x € M, g|, is a positive 
definite inner product. A manifold with an indefinite inner product at each 
point is called a pseudo/semi-Riemannian manifold. 


Historical Notes 

No great mind of the past has exerted a deeper influence on the mathematics of the twen- 
tieth century than Georg Friedrich Bernhard Riemann (1826-1866), the son of a poor 
country minister in northern Germany. He studied the works of Euler and Legendre while 
he was still in secondary school, and it is said that he mastered Legendre’s treatise on the 
theory of numbers in less than a week. But he was shy and modest, with little awareness 
of his own extraordinary abilities, so at the age of 19 he went to the University of Got- 
tingen with the aim of pleasing his father by studying theology and becoming a minister 
himself. Fortunately, this worthy purpose soon stuck in his throat, and with his father’s 
willing permission he switched to mathematics. 

The presence of the legendary Gauss automatically made Gottingen the center of the 
mathematical world. But Gauss was remote and unapproachable—particularly to begin- 
ning students—and after only a year Riemann left this unsatisfying environment and went 
to the University of Berlin. There he attracted the friendly interest of Dirichlet and Jacobi, 
and learned a great deal from both men. Two years later he returned to Gottingen, where 
he obtained his doctor’s degree in 1851. During the next 8 years, despite debilitating 
poverty, he created his greatest works. In 1854 he was appointed Privatdozent (unpaid 
lecturer), which at that time was the necessary first step on the academic ladder. Gauss 
died in 1855, and Dirichlet was called to Gottingen as his successor. Dirichlet helped 
Riemann in every way he could, first with a small salary (about one-tenth of that paid to 
a full professor) and then with a promotion to an assistant professorship. In 1859 he also 
died, and Riemann was appointed as a full professor to replace him. Riemann’s years of 
poverty were over, but his health was broken. At the age of 39 he died of tuberculosis in 
Italy, on the last of several trips he undertook in order to escape the cold, wet climate of 
northern Germany. Riemann had a short life and published comparatively little, but his 
works permanently altered the course of mathematics in analysis, geometry, and number 
theory. 

It is said that the three greatest mathematicians of modern times are Euler, Gauss, and 
Riemann. It is a curiosity of nature that these three names are among the most frequently 
mentioned names in the physics literature as well. Aside from the indirect use of his name 
in the application of complex analysis in physics, Riemannian geometry has become the 
most essential building block of all theories of fundamental interactions, starting with 
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gravity, which Einstein formulated in this language in 1916. As part of the requirement 
to become a Privatdozent, Riemann had to write a probationary essay and to present a 
trial lecture to the faculty. It was the custom for the candidate to offer three titles, and the 
head of his department usually accepted the first. However, Riemann rashly listed as his 
third topic the foundations of geometry. Gauss, who had been turning this subject over in 
his mind for 60 years, was naturally curious to see how this particular candidate’s “glo- 
riously fertile originality” would cope with such a challenge, and to Riemann’s dismay 
he designated this as the subject of the lecture. Riemann quickly tore himself away from 
his other interests at the time—‘my investigations of the connection between electricity, 
magnetism, light, and gravitation”—and wrote his lecture in the next two months. The 
result was one of the great classical masterpieces of mathematics, and probably the most 
important scientific lecture ever given. It is recorded that even Gauss was surprised and 
enthusiastic. 


Since a (semi)Riemannian manifold has a metric, at each point x there is 
a fiber metric, namely g(x). As before, a connection I" is a metric connec- 
tion if g is parallel with respect to . There may be many connections on 
a (semi)Riemannian manifold. In fact, any set of functions that transform 
according to Eq. (36.21) define a unique connection by Theorem 36.2.19. 
Many of these connections may be metric. However, one connection stands 
out. (See [Koba 63, pp 158-160] for a proof of the following theorem.) 


Theorem 37.1.4 Every (semi-)Riemannian manifold admits a unique 
metric connection, called Levi-Civita connection, whose torsion is 
zero. 


Because of the uniqueness of the Levi-Civita connection, we can identify 
it with the manifold itself. And since an important property of the connection 
is whether it is flat or not, we have 


Definition 37.1.5 A (semi-)Riemannian manifold is called flat if its Levi- 
Civita connection is flat. 


Applying Theorem 34.3.11 and Eq. (36.8), we also have 


Proposition 37.1.6 A (semi-)Riemannian manifold is flat iff its curvature R 
vanishes identically. 


Given the metric g of the (semi)Riemannian manifold, define a covariant 
derivative as follows 


2g(VxY, Z) = X(g(Y, Z)) + ¥(g(X, Z)) — Z(g(X, Y)) 
+ 9([X, Y], Z) + g([Z, X], Y) + g([Z, ¥],X). (37.2) 


It is straightforward to show that this covariant derivative satisfies the four 
conditions of Proposition 36.2.11. Therefore, by Theorem 36.2.12, it is the 
covariant derivative with respect to some linear connection. That linear con- 
nection turns out to be the Levi-Civita connection. Problem 37.2 shows that 
if Vg = 0, then (37.2) holds. 
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Historical Notes 

Tullio Levi-Civita (1873-1941), the son of Giacomo Levi-Civita, a lawyer who from 
1908 was a senator, was an outstanding student at the liceo in Padua. In 1890 he enrolled 
in the Faculty of Mathematics of the University of Padua. Giuseppe Veronese and Gre- 
gorio Ricci-Curbastro were among his teachers. He received his diploma in 1894 and in 
1895 became resident professor at the teachers’ college annexed to the Faculty of Science 
at Pavia. From 1897 to 1918 Levi-Civita taught rational mechanics at the University of 
Padua. His years in Padua (where in 1914 he married a pupil, Libera Trevisani) were sci- 
entifically the most fruitful of his career. In 1918 he became professor of higher analysis 
at Rome and, in 1920, of rational mechanics. In 1938, struck by the fascist racial laws 
against Jews, he was forced to give up teaching. 

The breadth of his scientific interests, his scruples regarding the fulfillment of his aca- 
demic responsibilities, and his affection for young people made Levi-Civita the leader of 
a flourishing school of mathematicians. 

Levi-Civita’s approximately 200 memoirs in pure and applied mathematics deal with ana- 
lytical mechanics, celestial mechanics, hydrodynamics, elasticity, electromagnetism, and 
atomic physics. His most important contribution to science was rooted in the memoir 
“Sulle trasformazioni delle equazioni dinamiche” (1896), which was characterized by the 
use of the methods of absolute differential calculus that Ricci had applied only to dif- 
ferential geometry. In the “Méthodes de calcul différentiel absolus et leurs applications”, 
written with Ricci and published in 1900 in Mathematische Annalen, there is a complete 
exposition of the new calculus, which consists of a particular algorithm designed to ex- 
press geometric and physical laws in Euclidean and non-Euclidean spaces, particularly 
in Riemannian curved spaces. The memoir concerns a very general but laborious type of 
calculus that made it possible to deal with many difficult problems, including, according 
to Einstein, the formulation of the general theory of relativity. 

Although Levi-Civita had expressed certain reservations concerning relativity in the first 
years after its formulation (1905), he gradually came to accept the new views. His own 
original research culminated in 1917 in the introduction of the concept of parallel trans- 
port in curved spaces. With this new concept, absolute differential calculus, having ab- 
sorbed other techniques, became tensor calculus, now the essential instrument of the uni- 
tary relativistic theories of gravitation and electromagnetism. 

In his memoirs of 1903-1916 Levi-Civita contributed to celestial mechanics in the study 
of the three-body problem: the determination of the motion of three bodies, considered 
as reduced to their centers of mass and subject to mutual Newtonian attraction. In 1914— 
1916 he succeeded in eliminating the singularities present at the points of possible colli- 
sions, past or future. His research in relativity led Levi-Civita to mathematical problems 
suggested by atomic physics, which in the 1920s was developing outside the traditional 
framework: the general theory of adiabatic invariants, the motion of a body of variable 
mass, the extension of the Maxwellian distribution to a system of corpuscles, and the 
Schrédinger equation. 


The components of the curvature, the Riemann-Christoffel symbols, can 
be calculated by substituting the coordinate vector fields {0;} in Eq. (37.2). 
It is then easy to show that 


1 Agim  O8im Oi; 
ré* mk jm im J 
oe (Se * oxF 7 aan) ee 


This equation is sometimes written as 


Peij = ( ait 4. on 54), (37.4) 


2\ dx! ax/ axk 


where Vij = Lkm Vy 

Now we consider the connection between an infinitesimal displacement 
and a metric. Let P be a point of M. Let y be accurve through P such that 
y(c) = P. For an infinitesimal number 6u, let P’ = y(c + du) be a point 
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on y close to P. Since the x! are well-behaved functions, x!(P’) — x'(P) 
are infinitesimal real numbers. Let €! = x!(P’) — x'(P), and construct the 
vector v= & ‘a;, where {d;} consists of tangent vectors at P. We call v the in- 
finitesimal displacement at P. The length (squared) of this vector, g(v, v), 
is shown to be g;;& ‘é/, This is called the arc length from P to P’, and 
is naturally written as ds* = g; 13 ‘€J, It is customary to write dx! (not a 
1-form!) in place of &!: 


ds” = gij;dx'dx!, (37.5) 


where the dx! are infinitesimal real numbers. 
In applications, it is common to start with the metric tensor g given in 
terms of coordinate differential forms: 


g=gijdx' @dx/, where gi; = gji = 9(0j, dj). (37.6) 


The equivalence of the arc length (37.5) and the metric (37.6) is the reason 
why it is the arc length that is given in most practical problems. Once the arc 
length is known, the metric g;; can be read off, and all the relevant geometric 
quantities can be calculated from it. 


Example 37.1.7 Let us determine the geodesics of the space whose arc 
length is given by ds* = (dx* + dy*)/y* (see also Example 37.1.10). With 


x =x! and y = x”, we recognize the metric tensor as 
1 
811 = 8&2 ==, 812 = 821 = 9, 
¥ 
11 22_ 2 12 21 
§& =8 = ’ & =s,g = 0. 


Using Eq. (37.4), we can readily calculate the nonzero connection coeffi- 
cients: 


My2=Vi21 = —Pan1 =P 222 = a 


The geodesic equation for the first coordinate is 


d2x 1 dx! dx/ 


ae ded 


ax 1 [ax 4 1 ax dy 1 (dy 2 
ma ttu(Z) va oga tala) = 


To find the connection coefficients with raised indices, we multiply those 
with all indices lowered by the inverse of the metric tensor. For instance, 


or 


i 1 
My=se'Ti=8' Tint g? oav= *(—5) aa—. 
0 y 
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Similarly, P'! 1 =O0=0 oe and the geodesic equation for the first coordi- 
nate becomes 


d*x 1 dx dy 
—~ —2-— ~ =9. 37.7 
dt? y dt dt wed) 


For the second coordinate, we need Ta T BCT and er These can be 
readily evaluated as above, with the result 
1 


2 2 2 
Pol a= M2 =9, 


yielding the geodesic equation for the second coordinate 


d?y 1fdx\* 1(dy\? 
=0. 37.8 
++(F) -(2) Gn) 


With x = dx/dt, Eq. (37.7) can be written as 


difdt _ ,dy/dt 


x => x=Cy’. 
dt y x y A 


Using the chain rule and the notation y’ = dy/dx, we obtain 


dy / Df 
—— =C ; 
dt yx wy 
d*y dy dy’ 
=C 2 Y / 2 5 =C2(2y3 y2 4 
dtz (rs +y de (2y°y + y*y") 


Substituting in Eq. (37.8) yields 


4. 


d 
yy? +yty"ty=0 > (yy! tyy"+1=0 3 —(y)+1=0. 


dx 
It follows that yy’ = —x + A and x* + y? =2Ax + B. Thus, the geodesics 
are circles with arbitrary radii whose centers lie on the x-axis. 


37.1.1 Orthogonal Bases 


The presence of a metric allows for orthonormal bases, which are sometimes 
more convenient than the coordinate bases. If the structure group reduces 
to O(n), then g(m; IR) reduces to o(m), which is the set of antisymmetric 
matrices.! Therefore, the E! s in Eq. (36.4) will be antisymmetric in i and j, 
making ow, and a also antisymmetric. This simplifies the calculation of the 
curvature form as given in Box 36.2.9. For this, we need the orthonormal 
bases {e;} and {e/}, which can be constructed in terms of {0;} and {dx}, 
respectively. The following examples illustrate the procedure. 


‘Although we are restricting the discussion to O(n), it also applies to the more general 
group O(n —v, Vv). 
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Example 37.1.8 Let us look at a few examples of arc lengths, the corre- 
sponding metrics, and the orthogonal bases derived from them. 


(a) For ds* = dx? + dy? + dz’, g is the Euclidean metric of R*, with 


8ij = 5ij- 

(b) For ds? = —dx* — dy* — dz? + dt”, g is the Minkowski (or Lorentz) 
metric of R*, with 8ij = Nij, Where Nxx = Nyy = Nzz = —N = —1 and 
nij =O fori Fj. 


(c) For ds? = dr? + r2(d6? + sin* 6d”), the metric is the Euclidean 
metric given in spherical coordinates in R? with Srr = 1, S00 = r?, 
809 = r? sin’ 6, and all other components zero. 

(d) For ds* = a*d0* +a? sin’ dd¢’, the metric is that of a two-dimen- 
sional spherical surface, with ggg = a, 09 = a’ sin” 0, and all other 
components zero. 

(e) For 


ds* = dt? —a°(t)[dx? + sin’ x (d0? + sin? 6dy’)], 
the metric is the Friedmann metric used in cosmology. Here 
2 
Sr = 1, 8xx = —[ao]’, 
Dis 2. ; 
800 = —[a(t)] sin’ xX, 809 = —[a(t)] sin? x sin’ 0. 


(f) For 


2M 2M\~! 
as? =(1 Jar (1 ) dr? — r?(d0” + sin* 6dg’), 
r r 


the metric is the Schwarzschild metric with 
8r=1—-2M/r, grr =—-—2M/r), 
£00 say, Sop =r" sin? 0. 


For each of the arc lengths above, we have an orthonormal basis of one- 
forms: 


(a2) g=e! @e! +e? @e* +e Be?, with 
€ =dx, €* =dy, e =dz: 
(b+) g=-e!@e!—e? Ge? - Bei +e We, with 
el =dx, e* =dy, e=dz, e =dt; 
(c) g=e" Ge’ +e? Be? +e? We”, with 
é’ =dr, e° =rd6, e’ =rsinddg; 


(d) g=e' @e% +e’ @e%, with e€? =ad0,€% =asinddg; 
(ec) g=e' @e! —€% @eX —€° BE° —€” GE®, with 


e' =dt, eX =a(t)dx, 
€°=a(t)sinyd@,  €%=a(t)sinx sinédg; 
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(f) g=e' @e'—e€' Be’ —e€° @e® —e€% BE?, with 


e'=(1—2M/r)'?dt,  €” = (1—2M/r)~'/"ar, 


e? =rdo, e? =rsinédg. 


Example 37.1.9 In this example, we examine the curvilinear coordinates 
used in vector analysis. Recall that in terms of these coordinates the dis- 
placement is given by ds* = hs(dqi)* + h3(dq2)* + h3(dq3). Therefore, 
the orthonormal one-forms are €! = hidq, e= hodq2, e= h3dq3. We 
also note (see Problem 28.23) that 

rf rf af 


dx \dy \d 
watgat sr) or 


= V" fdx Ady Adz. (37.9) 


axaf=( 


We use this equation to find the Laplacian in terms of q1, g2, and q3: 


a a a 
df = SF oa + TT igs + of ag, 
0q1 dq2 0q3 


1a 1 1 
= (pele + (Gat )e + (Gan le 
hy oq) hy 0q2 h3 0q3 


where we substituted orthonormal 1-forms so we could apply the Hodge star 
operator. It follows now that 


10 10 1 a 
ap = (Po nets (Po) net (Do) ne! 
hy oq) hz 0q2 h3 0q3 
~ oe? +(p ae +(e 2 
=| — = Je Ae + | —x=— Je Ae + | — x— Je Ae 
G agi hz 0q2 h3 0q3 
hzh3 of hyh3 ©) 
=| —— — Jdqaard — — ]dq3 ad 
& - _ w+( hy oqo) °°" 
hyho =) 
+ {| ——— ]dq Adq. 
( h3 093 
Differentiating once more, we get 
(= af 


a a da A dqz \ dq3 


0 
dxdf = — 
er hy oq 


0q1 
a [hihs OF 


— | —— — ]dqr Adq3 Ad 
+2 i a) q2z \dq3 N\aq\ 


a (ut af 


J ee Reap Alas 
8q3 \ hy ag oe 


-| 1 E & Zar a ea a 
hyhoh3 dq: \ hy dOqy dq2\ hr dq2z 


a (hho d 
+ (A Vet netne’ (37.10) 
093 \ h3 0q3 
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Since {el ,€, e*} are orthonormal one-forms (as are {dx, dy, dz}), the vol- 


ume elements €! A €? Ae? and dx Ady A dz are equal. Thus, we substitute 
the latter for the former in (37.10), compare with (37.9), and conclude that 


1 0d (hoh3 da 0d (hyh3 d 
Vf= ( 2h3 a ( 1h3 “) 
hyhoh3loqi\ hi oqi 0q2\ h2 0q2 


2 Ge af )| 
0qg3\ h3 0q3) }’ 


which is the result obtained in curvilinear vector analysis. 


In an orthonormal frame, Eq. (36.20) becomes 
ol = Tye. (37.11) 


where as usual, we have omitted the subscript U. Furthermore, using 
Eq. (36.6) with T =e’, one can easily show that 


Vel =—T',¢' @e/ = Tye" @E/ =-a BE! 
for a Levi-Civita connection. Now use Corollary 36.3.10 to obtain 
de' = A(Ve') =A(-o', @€/) =—a', Ne/. (37.12) 


This can also be written in matrix form if we use Box 36.2.9 and define the 
column vector ¢ with elements e!. We then have 


deé=-—ONE. 


Taking the exterior derivative of this equation gives 


n~ 


0=d’e =—-dOAN€+@Ade=—dOA\E-OAGAE=—QAE. 
Combining the two equations, we get 
de=-OAe, RAe=0. (37.13) 


The antisymmetry of QQ ini and j and Eq. (36.25) give 
Rigg + Rig =0, Roy + Ri =0. (37.14) 


Similarly, the second relation of Eq. (37.13) can be shown to be equivalent 
to Square brackets mean 


Rig + Ri + Ri 2h 488 Rea a6) (37.15) antisymmetrization. 


where the enclosure of indices in square brackets means complete antisym- 
metrization of those indices. The first relation of (37.15) can also be ob- 
tained from Bianchi’s Ist identity given in Theorem 36.2.17. 

It is also common to lower the upper index by the metric and define 
Rijki = 8im Ria Then the new tensor has the additional properties 


Rijkt = Reig and Re jxy = 0. (37.16) 
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Equation (37.12) gives us a recipe for finding the curvature from the arc 
length. Given the arc length, construct the orthogonal 1-forms as in Exam- 
ples 37.1.8 and 37.1.9. Then take the exterior derivative of a typical one and 
read off On from the right-hand side of the equation. Form the matrix ® and 
use Box 36.2.9 to calculate Q. The coefficients of the entries of Q are the 
components of the Riemann curvature tensor according to Eq. (36.25). 


Example 37.1.10 Let M = R?, and suppose that the arc length is given by 
ds* = (dx* + dy*)/y* (see Example 37.1.7). We can write the metric as 
g=e! @e! +e? Se? if we define 


d d 
fo m@ Co. 


To find the curvature tensor, we take the exterior derivative of the e's: 
1 1 1 1 2 2 
de’ =d\ —dx = —~dy ANdx =€ AE, de~ = 0. (37.17) 
y y 


From these equations, the antisymmetry of the w’s, and Eq. (37.12), we can 


read off w,. They are ot} = ws = 0 and ws = oF =e!. Thus, the matrix @ 


1S 


@ =o EB) + w}E] =o} (( ») +} (; 5) 


_(0 a\_(0 -e! 
“los O} \el OS’ 


which gives 


and 


OTe ie a ee oe =0 
“Ne! 0 e OO} % 


Therefore, the curvature matrix is 


a 0 —e! Ae? 0 2 
Q —4 oO — — 1 : 
sid (2 Ae 0 ) i 0 
This shows that the only nonzero independent component of the Riemann 
curvature tensor is Rj212 = —1. 


Example 37.1.11 For a spherical surface of radius a, the element of length 
is 


ds” =a’d0? +a’ sin’ 6dg’. 


The orthonormal forms are €° = ad@ and e? =asin6éd gy, and we have 


1 
de® =0, de? =acos0d0 A dy = —cotde® Ae?®. 
a 
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The matrix @ can now be read off: 


a 0 — cotb ey 
cov ee 0 . 


A straightforward exterior differentiation yields 


jae. 0 seo Ne? ke 0 e? Ae? 
—4¢° Aee 0 a2 \—e® Ae? 0 , 


Similarly, @ A @ = 0. Therefore, the curvature matrix is 


ee | 0 €? Ae? 
@=ad= 7 (_ 2 eo 0 ). 


The only independent component of the Riemann curvature tensor is 
Rogog = 1 /a”, which is constant, as expected for a spherical surface. 


It is clear that when the g;; in the expression for the line element are all Whatis a flat manifold? 

constants for all points in the manifold, then €' will be proportional to dx! 
and de! = 0, for all i. This immediately tells us that @ = 0, and therefore 
Q= 0; that is, the manifold is flat. Thus, for ds* = dx? + dy” + dz’, the 
space is flat. However, arc lengths of a flat space come in various guises 
with nontrivial coefficients. Does the curvature matrix Q recognize the flat 
arc length, or is it possible to fool it into believing that it is privileged with a 
curvature when in reality the curvature is still zero? The following example 
shows that the curvature matrix can detect flatness no matter how disguised 
the line element is! 


Example 37.1.12 In spherical coordinates, the line element (arc length) of 
the flat Euclidean space R? is ds? = dr? + r*d6* + r? sin? Odg”. To cal- 
culate the curvature matrix, we first need an orthonormal set of one-forms. 
These are immediately obtained from the expression above: 


é” =dr, €° = rd, e’ =rsinédg. 
Taking the exterior derivatives of these one-forms, we obtain 


de’ =d’r =0, 


0 
1. 
de’ =aGd\ Sar Ado er 0 Se K (<) Soe Ke 
‘ r 
=0 


de? =d(rsin0) Ady =sinédr Ady +rcosddé \ do 


g 6 g 
= sinde’ A 7 +rcosé = A = 
rsin@ r rsin@ 


1 coté 
=—¢" Ag? + —_e? Ak’, 
r r 


Note that de? = 0 does not imply wi2 = 0. 
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We can now find the matrix of one-forms @. In calculating the elements 
of @, we remember that it is a skew-symmetric matrix, so all diagonal el- 
ements are zero. We also note that de* = 0 does not imply that of =0. 
Keeping these facts in mind, we can easily obtain @ (the calculation is left 
as a problem for the reader): 


1,0 ley 
0 7€ 7€ 
@= | te? 0 — bee 
lig coté@.@ 
~€ 7 € 0 


The exterior derivative of this matrix is found to be 


0 0 — 2? Ae? 
do = @) 0 “68 Ae? ’ 
CF eo Ag? ale? ner 0 


which is precisely (the negative of) the exterior product @ A @, as the reader 
may wish to verify. Thus, Q = d@ +@ A @ = 0, and the space is indeed flat! 


In all the foregoing examples, the curvature was calculated intrinsically. 
We never had to leave the space and go to a higher dimension to “see” 
the curvature. For example, in the case of the sphere, the only information 
we had was the line element in terms of the coordinates on the sphere. We 
never had to resort to any three-dimensional analysis to discover a globe 
embedded in the Euclidean R*. As mentioned earlier, if a space has line 
elements with constant g;;, then the Riemann curvature vanishes trivially. 
We have also seen an example in which the components of a metric tensor 
were by no means trivial, but Q was smart enough to detect the flatness in 
disguise. Under what conditions can we choose coordinate systems in terms 
of which the line elements have g;; = 6;;? To answer this question we need 
the following lemma (proved in [Flan 89, pp. 135—136]): 


Lemma 37.1.13 [f@ is a matrix of 1-forms such that d® + @ \@ = 0, then 
there exists an orthogonal matrix A such that dA = AO. 


The question raised above is intimately related to the connection between 
coordinate and orthonormal frames. We have seen the usefulness of both. 
Coordinate frames, due to the existence of the related coordinate functions, 
are useful for many analytical calculations, for example in Hamiltonian dy- 
namics. Orthonormal frames are useful because of the simplicity of expres- 
sions inherent in all orthonormal vectors. Furthermore, we saw how cur- 
vature was easily calculated once we constructed orthonormal dual frames. 
Naturally, we would like to have both. Is it possible to construct frames that 
are both coordinate and orthonormal? The following theorem answers this 
question: 


Theorem 37.1.14 Let M be a Riemannian manifold. Then M is flat, i.e., 
Q = 0 if and only if there exists a local coordinate system {x'} for which 
{0;} is an orthonormal basis. 


37.2 lsometries and Killing Vector Fields 


Proof The existence of orthonormal coordinate frames implies that {dx'} 
are orthonormal. Thus, we can use them to find the curvature. But since 
d(dx') =0 for all i, it follows from Eq. (37.12) that a! ; =0 and@ =0. So 
the curvature must vanish. 

Conversely, suppose that Q = 0. Then by Lemma 37.1.13, there exists 
an orthogonal matrix A such that dA = A@. Now we define the one-form 
column matrix t by t = Ag, where é is the column matrix of 1-forms é!. 
Then, using Eq. (37.12) in matrix form, we have 


dt =d(Ae)=dAAeé+Ade = (AW) Ne —A@A8) = 0. 


Thus, dt! = 0 for all i. By Theorem 28.5.15 there must exist zero-forms 
(functions) x! such that t! = dx'. These x! are the coordinates we are after. 
The basis {0;} is obtained using the inverse of A (see the discussion fol- 
lowing Proposition 26.1.1). Since A is orthogonal, both {dx'} and {d;} are 
orthonormal bases. 


37.2 Isometries and Killing Vector Fields 


We have already defined isometries for inner product spaces. The natural 
vector space structure on tangent spaces, plus the introduction of metrics on 
manifolds, leads in a natural way to the isometric mappings of manifolds. 


Definition 37.2.1 Let M and N be Riemannian manifolds with metrics g jy 
and gy, respectively. The smooth map w : M —> N is called isometric at 
PéEM if gy (X%, Y) =9y(v.X, WY) for all X, Y ¢ Tp(M). An isometry 
of M to N is a bijective smooth map that is isometric at every point of M, 
in which case we have gy =9Jy ° Wx. 


Of special interest are isometries yy : M — M of a single manifold. These 
happen to be a subgroup of Diff(/), the group of diffeomorphisms of M. 
In the subgroup of isometries, we concentrate on the one-parameter groups 
of transformations. These define (and are defined by) certain vector fields: 


Definition 37.2.2 Let X € X(M) be a vector field with integral curve F;. 
Then X is called a Killing vector field if F; is an isometry of M. 


The following proposition follows immediately from the definition of the 
Lie derivative [Eq. (28.29)]. 


Proposition 37.2.3 A vector field X € X(M) is a Killing vector field 
if and only if Lxg = 0. 
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Choosing a coordinate system {x'}, we write g = gijdx' @ dx/ and con- 
clude that X/a }; 18 a Killing vector field if and only if 


0= Lx(gijdx' @ dx!) 
= X(gij)dx' ® dx! + g;;(Lxdx') @ dx! + gijdx' ® (Lxdx’). 
Using Eq. (28.33) for the 1-form dx*, we obtain the Killing equation 
X* On gi; + 0X" gn; +9) X* ori =0. (37.18) 


If in Eq. (36.38) we replace T with g and u with X, where X is a Killing 
vector field, we obtain 


0=0+ gi;[(Ve,X)'e* Be! + (Ve, Xie! BE*], (37.19) 


where we have assumed that the covariant derivative is compatible with the 
metric tensor, i.e., that it is the Levi-Civita covariant derivative. The reader 
may check that Eq. (37.19) leads to 


X jk + Xx,j =0. (37.20) 
This is another form of Killing equation. 


Proposition 37.2.4 [f X is a Killing vector field, then its inner product with 
the tangent to any geodesic is constant along that geodesic, i.e., if Uw is such 
a tangent, then Vy[g(u, X)] = 0. 


Proof We write the desired covariant derivative in component form, 


uk(giju'X/)., = ul gijeu'X! tuk gi; ul, XI + uk giju'X! 
=0 =0 
1 ik 
= (Xie +X) UU, 
—$— $< 
=0 by (37.20) 

where the first term vanishes by assumption of the compatibility of the met- 
ric and the covariant derivative, and the second term by the geodesic equa- 
tion. 


Example 37.2.5 In a flat m-dimensional manifold we can choose an or- 
thonormal coordinate frame (Theorem 37.1.14), so that the Killing equation 
becomes 


0; Xj +0;X; =0. (37.21) 
Setting i = j, we see that 0;X ; =0 (no sum). Differentiating Eq. (37.21) 
with respect to x’, we obtain 
a?X; +9; 9;X; =0 > g?x; =0 Vi, j. 
=0 
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Therefore, X ; is linear in x! ice., Xj= ajux* +b; with aj, and b; arbitrary. 
Inserting this in (37.21), we get aj; + aj; = 0. The Killing vector is then 


X= (a' jx! + b') a; = x) (x/ 9; _ x'd;) + b!d;. 


The first term is clearly the generator of a rotation and the second term that 
of translation. Altogether, there are m(m — 1)/2 rotations and m translations. 
So the total number of independent Killing vectors is m(m + 1)/2. Mani- 
folds that have this many Killing vectors are called maximally symmetric 
spaces. 


Using Eqs. (28.34) and (28.35) one can show that the set of Killing vector 
fields on a manifold form a vector subspace of X(M), and that if X and Y 
are Killing vector fields, then so is [X, Y]. Thus, 


Box 37.2.6 The set of Killing vector fields forms a Lie algebra. 


Example 37.2.7 From g = dé @d6 + sin 6dy @ dg, the metric of the unit 
sphere S*, one writes the Killing equations 


09 Xo + 06 Xo = 0, 
dgXy + IgXy +2siné cosdXg = 0, (37.22) 
do Xy + IgXo — 2cotdXy =0. 


The first equation implies that Xg = f(g), a function of g only. Substitution 
in the second equation yields 


1 dF 
Xy =—~F()sin260+ (0), where f(g) = —. 
2 doy 
Inserting this in the third equation of (37.22), we obtain 
d d 1 
—F(g)cos20 + = + af +2cot0| =F (gp) sin20 — g(@) | =0, 
dé do 2 


or 
dg df 
— —2cotdg(0 —+F =0. 
E cotég( + [E+ | 
For this equation to hold for all @ and g, we must have 


aS Seatepaco-~ He 
— —2co =C=—-—-— 
do 2 de 


where C is a constant. This gives 


g(8) = (C; — Ccoté) sin*@ and f(g) = Xe =Asing + Bcosg 
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with 
Xy = (Acosg — Bsing) sin@ cos + Cy sin? 0. 


A general Killing vector field is thus given by 
X= X°9 + XA, = ALy — BL» + CiL;z, 
where 


Ly =—cos dg + coté sin gdg, 
Ly = sin G09 + COLO COS GIy, 
Lz = 


are the generators of SO(3). 


Sometimes it is useful to relax the complete invariance of the metric ten- 
sor under the diffeomorphism of a manifold induced by a vector field and 
allow a change of scale in the metric. More precisely, we consider vector 
fields X whose flow F; changes the metric of M. So F;,.g = e?g, where 
¢@ is a real-valued function on M that is also dependent on the parameter f. 
Such a transformation keeps angles unchanged but rescales all lengths. In 
analogy with those of the complex plane with the same property, we call 
such transformations conformal transformations. A vector field that gen- 
erates a conformal transformation will satisfy 


0 
X* ag gi; + OX" nj + Oj)X* gi =—Weij, W= ne ; (37.23) 
t=0 


and is called a conformal Killing vector field. 
We now specialize to a flat m-dimensional manifold and choose an or- 
thonormal coordinate frame (Theorem 37.1.14). Then Eq. (37.23) becomes 


0,X; +0;Xi =—Weij. (37.24) 
Multiply both sides by g’/ and sum over i to obtain 


Wicam + #x,so" 


sv m=dimM. 


Apply @! to both sides of Eq. (37.24) and sum over i. This yields 


. . 1 
aX; +9; 0'Xi=— aj => HX; =Zm—DAjp. 7.25) 
My 
2 


Differentiate both sides of the second equation in (37.25) with respect to x* 
and symmetrize the result in j and k to obtain 


(m — 2)d; dew = 919; (OX j + Oj Xe) = —8 jx 9! Iw. 


37.3. Geodesic Deviation and Curvature 


Raising the index j and contracting it with k gives 0'0;W = 0 if m 41. It 
follows that 


(m — 2)djoy =0. (37.26) 


Equations (37.24), (37.25), and (37.26) determine both y and X; ifm 42. 
It follows from Eq. (37.26) that y is linear in x, and, consequently [from 
(37.24)], that X; is at most quadratic in x. The most general solution to 
Eq. (37.24) satisfying (37.25) and (37.26) is 


x/ (x)= bi texi + a! x + cl x* x, — Qcpx* xd , (37.27) 


where a;; = —aj;; and indices of the constants are raised and lowered as 
usual. 

Equation (37.27) gives the generators of an [(m + 1)(m 4+ 2)/2]- 
parameter group of transformations on R”, m 4 2 called the conformal 
group. The reader should note that translation (represented by the parame- 
ters b/) and rotations (represented by the parameters a;;) are included in this 
group. The other finite (as opposed to infinitesimal) transformations of coor- 
dinates can be obtained by using Eq. (29.30). For example, the finite trans- 
formation generated by the parameter ¢ is given by the solution to the DE 
dx" /dt = x'J, whichis x’/ = e'x/, or x// = e®x/, and is called a dilitation 
of coordinates. Similarly, the finite transformation generated by the param- 
eter cj is given by the solution to the DE dx"! /dcj = 5!/x'*x,, — 2x"'x"/, 
or 
gf co pt? 
~ 1—2e-x + 2x?’ 


k 2 2 


H where c-x =c'xp, CC =C-C, X° =X-X, 


Xx 


which is called inversion, or the special conformal transformation. Equa- 
tions (37.25), and (37.26) place no restriction on yw (and therefore on X;) 
when m = 2. This means that 


Box 37.2.8 The conformal group is infinite-dimensional for R?. 


In fact, we encountered the conformal transformations of R? in the con- 
text of complex analysis, where we showed that any (therefore, infinitely 
many) analytic function is a conformal transformation of C = R*. The con- 
formal group of R? has important applications in string theory and statistical 
mechanics, but we shall not pursue them here. 


37.3 Geodesic Deviation and Curvature 


Geodesics are the straight lines of general manifolds on which, for example, 
free particles move. If u represents the tangent to a given geodesic, one can 
say that Vyu = 0 is the equation of motion of a free particle. In flat spaces, 
the relative velocity of any pair of free particles will not change, so that their 
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Fig. 37.1 A region of the manifold and the two-dimensional surface defined by s and t 


relative acceleration is always zero. In general, however, due to the effects 
of curvature, we expect a nonzero acceleration. Let us elaborate on this. 

Consider some region of the manifold through whose points geodesics 
can be drawn in various directions. Concentrate on one geodesic and its 
neighboring geodesics. Let t designate the parameter that locates points of 
the geodesic. Let s be a continuous parameter that labels different geodesics 
(see Fig. 37.1). One can connect the points on some neighboring geodesics 
corresponding to the same value of ¢ and obtain a curve parametrized by s. 
The collection of all geodesics that pass through all points of this curve 
form a two-dimensional submanifold with coordinates t and s. Each such 
geodesic is thus described by the geodesic equation Vyu = 0 with u= 0/dt 
(because f is a coordinate). Furthermore, as we hop from one geodesic to its 
neighbor, the geodesic equation does not change; i.e., the geodesic equation 
is independent of s. Translated into the language of calculus, this means 
that differentiation of the geodesic equation with respect to s will give zero. 
Translated into the higher-level language of tensor analysis, it means that 
covariant differentiation of the geodesic equation will yield zero. We write 
this final translation as Vy(Vyu) = 0 where n = 0/ds. This can also be 
written as 


0= VnVyu= VuVnu + [Vn, Valu = Vu(Vun+ [n, u}) +([Vn, Valu, 


where we have used the first property of Proposition 36.1.6. Using the fact 
that n and u are coordinate frames, we conclude that [n, u] = 0, which in 
conjunction with the second equation of Theorem 36.2.16, yields 


VuVun+ R(n, uu =0. (37.28) 


The first term can be interpreted as the relative acceleration of two geodesic 
curves (or free particles), because Vy is the generalization of the derivative 
with respect to t, and Vyn is interpreted as relative velocity. In a flat man- 
ifold, the relative acceleration for any pair of free particles is zero. When 
curvature is present, it produces a nonzero relative acceleration. 


37.3. Geodesic Deviation and Curvature 


By writing u= u!d; and n= n*d, and substituting in Eq. (37.28), we 
arrive at the equation of geodesic deviation in coordinate form: 
d Jkpm jan' m i,j,kpl m 
+ (win Tr jn) tu rs jetu u/n Mil ib 
(37.29) 


where we have used the fact that u' 3; f = df/dt for any function defined on 
the manifold. 


a dn" 
i,j,k pm _ 
uu’n’ R ijk = yr 


The chain that connects relative acceleration to curvature has another link 
that connects the latter two to gravity. From a Newtonian standpoint, grav- 
ity is the only force that accelerates all objects at the same rate (equivalence 
principle). From a geometric standpoint, this property allows one to include 
gravity in the structure of space-time: An object in free fall is considered 
“free”, and its path, a geodesic. Locally, this is in fact a better picture of re- 
ality, because inside a laboratory in free fall (such as a space shuttle in orbit 
around earth) one actually verifies the first law of motion on all objects float- 
ing in midair. One need not include an external phenomenon called gravity. 
Gravity becomes part of the fabric of space-time. 

But how does gravity manifest itself? Is there any observable effect that 
can indicate the presence of gravity, or by a mere transfer to a freely falling 
frame have we been able to completely eliminate gravity? The second alter- 
native would be strange indeed, because the source of gravity is matter, and 
if we eliminate gravity completely, we have to eliminate matter as well! If 
the gravitational field were homogeneous, one could eliminate it—and the 
matter that produces it as well, but no such gravitational field exists. The 
inhomogeneity of gravitational fields has indeed an observable effect. Con- 
sider two test particles that are falling freely toward the source of gravity 
on two different radii. As they get closer and closer to the center, their rel- 
ative distance—in fact, their relative velocity—changes: They experience a 
relative acceleration. Since as we saw in Eq. (37.28), relative acceleration is 
related to curvature, we conclude that 


Box 37.3.1 Gravity manifests itself by giving space-time a curvature. 


This is Einstein’s interpretation of gravity. From a Newtonian standpoint, 
the relative acceleration is caused by the inhomogeneity of the gravitational 
field. Such inhomogeneity (in the field of the Moon and the Sun) is respon- 
sible for tidal waves. That is why the curvature term in Eq. (37.28) is also 
called the tide-producing gravitational force. 


37.3.1 Newtonian Gravity 


The equivalence principle, relating gravity with the curvature of space-time, 
is not unique to Einstein. What is unique to him is combining that principle 
with the assumption of local Lorentz geometry, 1.e., local validity of special 
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relativity. Cartan also used the equivalence principle to reformulate Newto- 
nian gravity in the language of geometry. Rewrite Newton’s second law of 
motion as 

xi a@ 


F= =-V®@ — =0, 
a = ar? Tal 


where @ is the gravitational potential (potential energy per unit mass). The 
Newtonian universal time is a parameter that has two degrees of freedom: 
Its origin and its unit of measurement are arbitrary. Thus, one can change t 
to t =at + b without changing the physics of gravity. Taking this freedom 
into account, one can write 


dt? 


’ 


at d?xi  a@ (dt 
— ei) Et ag (=) =0. (37.30) 


Comparing this with the geodesic equation, we can read off the nonzero 
connection coefficients: 


j=1,2,3. (37.31) 


Inserting these in the second formula of Eq. (36.23), we find the nonzero 
components of Riemann curvature tensor: 


2 
22 eo. (37.32) 
O0kO 00k axJ axk 


Contraction of the two nonzero indices leads to the Laplacian of gravita- 
tional potential 


rd Pb Ih 


Ri... = -V’o. 
Oe Bx? By?” ax? 


Therefore, the Poisson equation for gravitational potential can be written in 
terms of the curvature tensor: 


Roo = Ri yg =40Gp, (37.33) 
where we have introduced the Ricci tensor, defined as 
Rix = Ri ix. (37.34) 


Equations (37.31), (37.32), and (37.33) plus the law of geodesic motion 
describe the full content of Newtonian gravitational theory in the geometric 
language of tensors.° 


3The classic and comprehensive book Gravitation, by Misner, Thorne, and Wheeler, has 
a thorough discussion of Newtonian gravity in the language of geometry in Chap. 13 and 
is highly recommended. 


37.4 General Theory of Relativity 


It is instructive to discover the relation between curvature and gravity 
directly from the equation of geodesic deviation as applied to Newtonian 
gravity. The geodesic equation is the equation of motion: 

xi  a@ ‘ 
at2 axi 


Differentiate this equation with respect to the parameter s, noting that 
a/ds =n'd/dx!: 


d (0?x4 (OO Vm, 2s d* (ax! xij? 2 
as\ ar ) | as\axi) a2 as J” ax! oxi) ~ 


Now note that dx//ds = n'dx//dx!' =n/. So, we obtain 


a2n/ 


l a 


ar a axidxi 


This is equivalent to Eq. (37.28), and one recognizes the second term as the 
tide-producing (or the curvature) term. 


37.4 General Theory of Relativity 


No treatment of (semi)Riemannian geometry is complete without a discus- 
sion of the general theory of relativity. That is why we shall devote this last 
section of the current chapter to a brief exposition of this theory. 

We have seen that Newtonian gravity can be translated into the language 
of differential geometry by identifying the gravitational tidal effects with 
the curvature of space-time. This straightforward interpretation of New- 
tonian gravity, in particular the retention of the Euclidean metric and the 
universality of time, leads to no new physical effect. Furthermore, it is in- 
consistent with the special theory of relativity, which mixes space and time 
coordinates via Lorentz transformations. Einstein’s general theory of rela- 
tivity (GTR) combines the equivalence principle (that freely falling objects 
move on geodesics) with the local validity of the special theory of relativity 
(that the metric of space-time reduces to the Lorentz-Minkowski metric of 
the special theory of relativity). 


37.4.1 Einstein’s Equation 


An important tensor that can be constructed out of Riemann curvature tensor 
and the metric tensor is the Einstein tensor G, which is related to the Ricci 
tensor defined in Eq. (37.34). To derive the Einstein tensor, first note that the 
Ricci tensor is symmetric in its indices (see Problem 37.21): 


Rij = Rji. (37.35) 


Next, define the curvature scalar as 
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R = g'/ Rij. (37.36) 


Now, contract i with m in Eq. (36.46) and use the antisymmetry of the Rie- 
mann tensor in its last two indices to obtain 


R it + Rj — Rj = 9. 


Finally, contract 7 and / and use the antisymmetry of the Riemann tensor 
in its first as well as its last two indices to get 2R',., — R:x =0, or Rjk:i — 


5 gjxR.; = 0. Summarizing the foregoing discussion, we write 
1 
V-G= 0, where Gij — Rij = 78k. (37.37) 


Historical Notes 

Karl Schwarzschild (1873-1916) was the eldest of five sons and one daughter born to 
Moses Martin Schwarzschild and his wife, Henrietta Sabel. His father was a prosperous 
member of the business community in Frankfurt, with Jewish forbears in that city traced 
back to the sixteenth century. 

From his mother, a vivacious, warm person, Karl undoubtedly inherited his happy, out- 
going personality, and from his father, a capacity for sustained hard work. His childhood 
was spent in comfortable circumstances among a large circle of relatives, whose interests 
included art and music; he was the first to become a scientist. 

After attending a Jewish primary school, Schwarzschild entered the municipal gymna- 
sium in Frankfurt at the age of eleven. His curiosity about the heavens was first mani- 
fested then: He saved his allowance and bought lenses to make a telescope. Indulging 
this interest, his father introduced him to a friend, J. Epstein, a mathematician who had 
a private observatory. With Epstein’s son (later professor of mathematics at the Univer- 
sity of Strasbourg), Schwarzschild learned to use a telescope and studied mathematics of 
a more advanced type than he was getting in school. His precocious mastery of celes- 
tial mechanics resulted in two papers on double star orbits, written when he was barely 
sixteen. 

In 1891 Schwarzschild began two years of study at the University of Strasbourg, where 
Ernst Becker, director of the observatory, guided the development of his skills in practical 
astronomy—skills that later were to form a solid underpinning for his masterful mathe- 
matical abilities. 

At age twenty Schwarzschild went to the University of Munich. Three years later, in 
1896, he obtained his Ph.D., summa cum laude. His dissertation was an application of 
Poincaré’s theory of stable configurations in rotating bodies to several astronomical prob- 
lems, including tidal deformation in satellites and the validity of Laplace’s suggestion as 
to how the solar system had originated. Before graduating, Schwarzschild also found time 
to do some practical work with Michelson’s interferometer. 

At a meeting of the German Astronomical Society in Heidelberg in 1900 he discussed the 
possibility that space was non-Euclidean. In the same year he published a paper giving a 
lower limit for the radius of curvature of space as 2500 light years. From 1901 until 1909 
he was professor at Gottingen, where he collaborated with Klein, Hilbert, and Minkowski, 
publishing on electrodynamics and geometrical optics. In 1906, he studied the transport 
of energy through a star by radiation. 

From Gottingen he went to Potsdam, but in 1914 he volunteered for military service. 
He served in Belgium, France, and Russia. While in Russia he wrote two papers on Ein- 
stein’s relativity theory and one on Planck’s quantum theory. The quantum theory paper 
explained that the Stark effect could be proved from the postulates of quantum theory. 
Schwarzschild’s relativity papers give the first exact solution of Einstein’s general grav- 
itational equations, giving an understanding of the geometry of space near a point mass. 
He also made the first study of black holes, showing that bodies of sufficiently large mass 
would have an escape velocity exceeding the speed of light and so could not be seen. 
However, he contracted an illness while in Russia and died soon after returning home. 


37.4 General Theory of Relativity 


The automatic vanishing of the divergence of the symmetric Einstein ten- 
sor has an important consequence in the field equation of GTR. It is remi- 
niscent of a similar situation in electromagnetism, in which the vanishing of 
the divergence of the fields leads to the conservation of the electric charge, 
the source of electromagnetic fields.* 

Just as Maxwell’s equations are a generalization of the static electricity 
of Coulomb to a dynamical theory, Einstein’s GTR is the generalization 
of Newtonian static gravity to a dynamical theory. As this generalization 
ought to agree with the successes of the Newtonian gravity, Eq. (37.37) must 
agree with (37.33). The bold step taken by Einstein was to generalize this 
relation involving only a single component of the Ricci tensor to a full tensor 
equation. The natural tensor to be used as the source of gravitation is the 
stress energy tensor? 


TY = (0 + p)u"u + pg’, or T=(p+p)u@u-+ pg, 


where the source is treated as a fluid with density p, four-velocity u, and 
pressure p. So, Einstein suggested the equation G = xT as the generaliza- 
tion of Newton’s universal law of gravitation. Note that V - G = 0 auto- 
matically guarantees mass-energy conservation as in Maxwell’s theory of 
electromagnetism. Problem 37.23 calculates « to be 8z in units in which 
the universal gravitational constant and the speed of light are set equal to 
unity. We therefore have 


1 
G=8rT, or R—->Rg=8z[(o + puut pgl. (37.38) 


This is Einstein’s equation of the general theory of relativity. 

The Einstein tensor G is nearly the only symmetric second-rank tensor 
made out of the Riemann and metric tensors that is divergence free. The 
only other tensor with the same properties is G + Ag, where A is the so- 
called cosmological constant (see Problem 37.24). When in 1922, Einstein 
applied his GTR to the universe itself, he found that the universe ought to 
be expanding. Being a firm believer in Nature, he changed his equation to 
G + Ag = 8zT to suppress the unobserved prediction of the expansion of 
the universe. Later, when the expansion was observed by Hubble, Einstein 
referred to this mutilation of his GTR as “the biggest blunder of my life”’. 

With the discovery of an accelerating universe, there has been a revival 
of interest in the cosmological constant. However, the fact that it is so small, 
and a lack of fundamental explanation of this fact, has become a challenge in 
cosmology. Perhaps a unification of GTR with quantum—a quantum theory 
of gravity—is a solution. But this unification has resisted an indisputable 


‘It was Maxwell’s discovery of the inconsistency of the pre-Maxwellian equations of 
electromagnetism with charge conservation that prompted him to change not only the 
fourth equation (to make the entire set of equations consistent with the charge conserva- 
tion), but also the course of human history. 


5In GTR, it is customary to use the convention that Greek indices run from 0 to 3, 1.e., they 
include both space and time, while Latin indices encompass only the space components. 
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solution (of the type that Dirac found in unifying STR with quantum the- 
ory) despite the effort of some of the most brilliant minds in contemporary 


physics. 


Historical Notes 

Aleksandr Aleksandrovich Friedmann (1888-1925) was born into a musical family— 
his father, Aleksandr Friedmann, being a composer and his mother, Ludmila Vojacka, the 
daughter of the Czech composer Hynek Vojacéek. 

In 1906 Friedmann graduated from the gymnasium with the gold medal and immediately 
enrolled in the mathematics section of the department of physics and mathematics of St. 
Petersburg University. While still a student, he wrote a number of unpublished scientific 
papers, one of which was awarded a gold medal by the department. After graduation 
from the university in 1910, Friedmann was retained in the department to prepare for the 
teaching profession. 

In the fall of 1914, Friedmann volunteered for service in an aviation detachment, in which 
he worked, first on the northern front and later on other fronts, to organize aerologic and 
aeronavigational services. While at the front, Friedmann often participated in military 
flights as an aircraft observer. In the summer of 1917 he was appointed a section chief 
in Russia’s first factory for the manufacture of measuring instruments used in aviation; 
he later became director of the factory. Friedmann had to relinquish this post because of 
the onset of heart disease. From 1918 until 1920, he was professor in the department of 
theoretical mechanics of Perm University. 

In 1920 he returned to Petrograd and worked at the main physics observatory of the 
Academy of Sciences, first as head of the mathematics department and later, shortly be- 
fore his death, as director of the observatory. Friedmann’s scientific activity was con- 
centrated in the areas of theoretical meteorology and hydromechanics, where he demon- 
strated his mathematical talent and his unwavering strife for, and ability to attain, the 
concrete, practical application of solutions to theoretical problems. 

Friedmann made a valuable contribution to Einstein’s general theory of relativity. As 
always, his interest was not limited simply to familiarizing himself with this new field 
of science but led to his own remarkable investigations. Friedmann’s work on the theory 
of relativity dealt with the cosmological problem. In his paper “Uber die Kriimmung des 
Raumes” (1922), he outlined the fundamental ideas of his cosmology: the supposition 
concerning the homogeneity of the distribution of matter in space and the consequent 
homogeneity and isotropy of space-time. This theory is especially important because it 
leads to a sufficiently correct explanation of the fundamental phenomenon known as the 
“red shift”. Einstein himself thought that the cosmological solution to the equations of 
a field had to be static and had to lead to a closed model of the universe. Friedmann 
discarded both conditions and arrived at an independent solution. 

Friedmann’s interest in the theory of relativity was by no means a passing fancy. In the 
last years of his life, together with V.K. Frederiks, he began work on a multivolume text 
on modern physics. The first book, The World as Space and Time, is devoted to the theory 
of relativity, knowledge of which Friedmann considered one of the cornerstones of an 
education in physics. 

In addition to his scientific work, Friedmann taught courses in higher mathematics and 
theoretical mechanics at various colleges in Petrograd. He found time to create new and 
original courses, brilliant in their form and exceedingly varied in their content. Fried- 
mann’s unique course in theoretical mechanics combined mathematical precision and 
logical continuity with original procedural and physical trends. 

Friedmann died of typhoid fever at the age of thirty-seven. In 1931, he was posthumously 
awarded the Lenin Prize for his outstanding scientific work. 


37.4 General Theory of Relativity 
37.4.2 Static Spherically Symmetric Solutions 


The general theory of relativity as given in Eq. (37.38) has been strikingly 
successful in predicting the spacetime® structure of our universe. It predicts 
the expansion of the universe, and by time-reversed extrapolation, the big 
bang cosmology; it predicts the existence of black holes and other final prod- 
ucts of stellar collapse; and on a less grandiose scale, it explains the small 
precession of Mercury, the bending of light in the gravitational field of the 
Sun, and the gravitational redshift. We shall not discuss the solution of Ein- 
stein’s equation in any detail. However, due to its simplicity and its use of 
geometric arguments, we shall consider the solution to Einstein’s equation 
exterior to a static spherically symmetric distribution of mass. 

Let us first translate the two adjectives used in the last sentence into a 
geometric language. Take static first. We call a phenomenon “static” if at 
different instants it “looks the same’. Thus, a static solution of Einstein’s 
equation is a spacetime manifold that “looks the same” for all time. In the 
language of geometry “looks the same” means isometric, because metric is 
the essence of the geometry of space-time. In Euclidean physics, time can 
be thought of as an axis at each point (moment) of which one can assign a 
three-dimensional space corresponding to the “spatial universe” at that mo- 
ment. In the general theory of relativity, space and time can be mixed, but the 
character of time as a parameter remains unaltered. Therefore, instead of an 
axis—a straight line—we pick a curve, a parametric map from the real line 
to the manifold of space-time. This curve must be timelike, so that locally, 
when curvature is ignored and special relativity becomes a good approxima- 
tion, we do not violate causality. The curve must also have the property that 
at each point of it, the space-time manifold has the same metric. Moreover, 
we need to demand that at each point of this curve, the spatial part of the 
space-time is orthogonal to the curve. 


Definition 37.4.1 A spacetime is stationary if there exists a one-parameter 
group of isometries F;, called time translation isometries, whose Killing 
vector fields € are timelike for all t: g(€,&) > 0. If in addition, there ex- 
ists a spacelike hypersurface & that is orthogonal to orbits (curves) of the 
isometries, we say that the spacetime is static. 


We can simplify the solution to Einstein’s equation by invoking the sym- 
metry of spacetime discussed above in our choice of coordinates. Let P be 
a point of the spacetime manifold located in a neighborhood of some space- 
like hypersurface & as shown in Fig. 37.2. Through P passes a single orbit 
of the isometry, which starts at a point Q of X. Let t, the so-called Killing 
parameter, stand for the parameter corresponding to the point P with t = 0 
being the parameter of Q. On the spacelike hypersurface &, choose arbitrary 
coordinates {x!} for Q. Assign the coordinates (¢ = x9, x! x?, x?) to P. 
Since F; does not change the metric, &; = F;%, the translation of & by F;, 


The reader may be surprised to see the two words “space” and “time” juxtaposed with 
no hyphen; but this is common practice in relativity. 
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Fig. 37.2 The coordinates appropriate for a stationary spacetime 


is also orthogonal to the orbit of the isometry. Moreover, the components 
of the metric in this coordinate system cannot be dependent on the Killing 
parameter rt. Thus, in this coordinate system, the spacetime metric takes the 
form 


3 
9 = goodt @dt— )° gijdx' @ dx! (37.39) 
i,j=l 


Definition 37.4.2 A spacetime is spherically symmetric if its isometry 
group contains a subgroup isomorphic to SO(3) and the orbits of this group 
are two-dimensional spheres. 


In other words, if we think of isometries as the action of some abstract 
group, then this group must contain SO(3) as a subgroup. Since SO(3) is 
isomorphic to the group of rotations, we conclude that the metric should 
be rotationally invariant. The time-translation Killing vector field € must 
be orthogonal to the orbits of SO(3), because otherwise the generators of 
SO(3) can change the projection of € on the spheres and destroy the ro- 
tational invariance. Therefore, the 2-dimensional spheres must lie entirely 
in the hypersurfaces X,. Now, we can write down a static spherically sym- 
metric metric in terms of appropriate coordinates as follows. Choose the 
spherical coordinates (0, g) for the 2-spheres, and write the metric of this 
sphere as 


9) =1r°dd @ dd +r’ sin’ Ody @ dg, 


where r is the “radius” of the 2-sphere. Choose the third spatial coordinate 
to be orthogonal to this sphere, i.e., r. Rotational symmetry now implies 
that the components of the metric must be independent of 6 and ¢. The final 
form of the metric based entirely on the assumed symmetries is 


g = f(r)dt @ dt —h(r)dr @ dr —r? (dO @ dO + sin’ Ody @ dg). (37.40) 


37.4 General Theory of Relativity 


3 
L,v=0 


of four variables i to that of two functions f and h of one vari- 
able. The remaining task is to calculate the Ricci tensor corresponding to 
Eq. (37.40), substitute it in Einstein’s equation (with the RHS equal to zero), 
and solve the resulting differential equation for f and h. We shall not pursue 
this further here, and we refer the reader to textbooks on general relativity 
(see, for example, [Wald 84, pp. 121—124]). The final result is the so-called 
Schwarzschild solution, which is 


2M 2M\~! 
g=(1-— — )dt@dt—(1-—) dr@adr 
r r 


—r?(d0 @ dO + sin’ 6dy ® dg), (37.41) 


We have reduced the problem of finding ten unknown functions {g,,,} 


where M is the total mass of the gravitating body, and the natural units of 
GTR, in which G = 1 =c, have been used. 

A remarkable feature of the Schwarzschild solution is that the metric 
components have singularities at r = 2M and at r = 0. It turns out that the 
first singularity is due to the choice of coordinates (analogous to the singu- 
larity at r = 0, 0 = 0, 2, and g = 0 in spherical coordinates of R? ), while 
the second is a true singularity of spacetime. The first singularity occurs at 
the so-called Schwarzschild radius whose numerical value is given by 

2GM M 


rs= ~ 3—— km, 
5 C2 Mo 


where Mp = 2 x 10*° kg is the mass of the Sun. Therefore, for an ordinary 
body such as the Earth, planets, and a typical star, the Schwarzschild radius 
is well inside the body where the Schwarzschild solution is not applicable. 

If we relax the assumption of staticity, we get the following theorem (for 
a proof, see [Misn 73, p. 843)): 


Theorem 37.4.3 (Birkhoff’s theorem) The Schwarzschild solution is 
the only spherically symmetric solution of Einstein’s equation in vac- 
uum. 


A corollary to this theorem is that all spherically symmetric solutions of 
Einstein’s equation in vacuum are static. This is analogous to the fact that the 
Coulomb solution is the only spherically symmetric solution to Maxwell’s 
equations in vacuum. It can be interpreted as the statement that in gravity, 
as in electromagnetism, there is no monopole “radiation”. 


37.4.3 Schwarzschild Geodesics 


With the metric available to us, we can, in principle, solve the geodesic 
equations [Eq. (36.44)] and obtain the trajectories of freely falling particles. 
However, a more elegant way is to make further use of the symmetries to 


1169 


Schwarzschild solution 
of Einstein's equation 


Schwarzschild radius 


All spherically symmetric 
solutions of Einstein’s 
equation in vacuum are 
static. 


1170 


37. Riemannian Geometry 


eliminate variables. In particular, Proposition 37.2.4 is extremely useful in 
this endeavor. Consider first g(dg, u) where uw is the 4-velocity (tangent to 
the geodesic). In the coordinates we are using, this yields 


G (Io, 0) = 9(Io, £9.) =i" Go, Iu) =1°8. 
——— 
don 


This quantity (because of Proposition 37.2.4 and the fact that dg is a Killing 
vector field) is a constant of the motion, and its initial value will be an at- 
tribute of the particle during its entire motion. We assign zero to this con- 
stant, i.e., we assume that initially 6 = 0. This is possible, because by rotat- 
ing our spacetime—an allowed operation due to rotational symmetry—we 
take the equatorial plane 0 = 2/2 to be the initial plane of the motion. Then 
the motion will be confined to this plane, because 6 = 0 for all time. 

For the parameter of the geodesic equation, choose proper time t if the 
geodesic is timelike (massive particles), and any (affine) parameter if the 
geodesic is null (massless particles such as photons). Then gyy»x"x" =k, 
where 

1 for timelike geodesics, 


k= ; (37.42) 
0 for null geodesics. 


In terms of our chosen coordinates (with 6 = 2/2), we have 
K = Syvx"x” = (1—2M/r)i* —(1—2M/r) | — 176. (37.43) 


Next, we apply Proposition 37.2.4 to the time translation Killing vector 
and write 


E = gyyx"&" = (1—2M/r)t, (37.44) 


where E is a constant of the motion and &€ = 9,. In the case of massive 
particles, as r — 00, i.e., as we approach special relativity, E becomes /, 
which is the rest energy of a particle of unit mass.’ Therefore, it is natural to 
interpret E for finite r as the total energy (including gravitational potential 
energy) per unit mass of a particle on a geodesic. 

Finally, the other rotational Killing vector field 0, gives another constant 
of motion, 


L=g(dy,u) =r°¢, (37.45) 


which can be interpreted as the angular momentum of the particle. This 
reduces to Kepler’s second law: Equal areas are swept out in equal times, in 
the limit of Newtonian (or weak) gravity. However, in strong gravitational 
fields, spacetime is not Euclidean, and Eq. (37.45) cannot be interpreted as 
“areas swept out”. Nevertheless, it is interesting that the “form” of Kepler’s 
second law does not change even in strong fields. 


7Recall that the 4-momentum of special relativity is p“ = mx“. 
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Solving for ¢ and @ from (37.44) and (37.45) and inserting the result in 
(37.43), we obtain 


oe gee an = if? (37.46) 
2° 9 rp J\re } 2 * ' 


It follows from this equation that the radial motion of a particle on a geodesic 
is the same as that of a unit mass particle of energy E7/2 in ordinary one- 
dimensional nonrelativistic mechanics moving in an effective potential 


v=! 1 me de = La as Ms (37.47) 
9 r r2 * — 9% a 2r2 ro , 


Once we solve Eq. (37.46) for the radial motion in this effective potential, 
we can find the angular motion and the time coordinate change from (37.44) 
and (37.45). The new feature provided by GTR is that in the radial equation 
of motion, in addition to the “Newtonian term” —« M/r and the “centrifugal 
barrier” L*/2r7, we have the new term —M L?/r?, which, although a small 
correction for large r, will dominate over the centrifugal barrier term for 
small r. 


Massive Particle 
Let us consider first the massive particle case, « = 1. The extrema of the 
effective potential are given by 


Mr? —L?r+3ML2=0 > Ry= asin eae (37.48) 
= 7 : aT Mae ics 


Thus, if L? < 12M7?, no extrema exist, and a particle heading toward the 
center of attraction, will fall directly to the Schwarzschild radius r = 2M, 
the zero of the effective potential, and finally into the spacetime singularity 
r=0. 

For L? > 12M?, the reader may check that Ry is a minimum of V(r), 
while R_ is amaximum. It follows that stable (unstable) circular orbits exist 
at the radius r = Ry (r = R_). In the Newtonian limit of M <« L, we get 
R, ® L?/M, which agrees with the calculation in Newtonian gravity (Prob- 
lem 37.26). Furthermore, Eq. (37.48) puts a restriction of Ry > 6M on R+ 
and 3M < R_ <6M on R_. This places the planets of the Sun safely in the 
region of stable circular orbits. 

If a massive particle is displaced slightly from its stable equilibrium ra- 
dius R,, it will oscillate radially with a frequency w, given by® 


2 av _ M(R, —6M) 
or =e ae Se 
r|nor, RY -3MR} 


8In the Taylor expansion of any potential V(r) about the equilibrium position ro of a 
particle (of unit mass), it is the second derivative term that resembles Hooke’s potential, 
5kx? with k = (d?V/dr?),9. 
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On the other hand, the orbital frequency is given by Eq. (37.45), 


L? M 


a 
~~ p4 ~ p3 2? 
Ri R32 —3MRY 


where L” has been calculated from (37.48) and inserted in this equation. In 
the Newtonian limit of M «< R1, we have wy © w, © M/R3. If ag = o,, 
the particle will return to a given value of r in exactly one orbital period and 
the orbit will close. The difference between wy and w, in GTR means that 
the orbit will precess at a rate of 


Wp = Wg — Wr = (1 — a / Wy) Og = -[d _ 6M/R,)'/? — Loo, 
which in the limit of M « R reduces to 


3M3/2-3(GM)3/2 
On ~ = 7 
re RE RP 


where in the last equality, we restored the factors of G and c. If we include 
the eccentricity e and denote the semimajor axis of the elliptical orbit by a, 
then the formula above becomes (see [Misn 73, p. 1110]) 


3(GM)>/? 

Wp © 20a (37.49) 

Due to its proximity to the Sun, Mercury shows the largest precession 
frequency, which, after subtracting all other effects such as perturbations due 
to other planets, is 43 seconds of arc per century. This residual precession 
rate had been observed prior to the formulation of GTR and had been an 
unexplained mystery. Its explanation was one of the most dramatic early 
successes of GTR. 


Massless Particle 
We now consider the null geodesics. With « = 0 in Eq. (37.47), the effective 
potential becomes 

' L? ML? 

(r) = are 
which has only a maximum at r = Rmax = 3M. It follows that in GTR, un- 
stable circular orbits of photons exist at r = 3M, and that strong gravity has 
significant effect on the propagation of light. 

The minimum energy required to overcome the potential barrier (and 
avoid falling into the infinitely deep potential well) is given by 


, (37.50) 


ie if oo , 
5F = V(Rmax) = 54M? => = 27M~. 
In flat spacetime, L/E is precisely the impact parameter b of a photon, 
ie., the distance of closest approach to the origin. Thus the Schwarzschild 
geometry will capture any photon sent toward it if the impact parameter is 
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less than b, = 27M. Hence, the cross section for photon capture is 
o = 1b? = 270M’. 

To analyze the bending of light, we write Eq. (37.46) as 
Lfadr\>s it os 
== Vir) = x<E*. 
5 ( =) eS 


Substituting for @ from Eq. (37.45) and writing the resulting DE in terms of 
a new variable u = M/r, we obtain 


du\* 4 b? E 
—) +u°(1—2u)= b=-, 
dp 


M?’ L 


where we used Eq. (37.50) for the effective potential. Differentiating this 
with respect to g, we finally get the second-order DE, 
du 2 

wo ; (37.51) 
In the large-impact-parameter or small-u approximation, we can ignore the 
second-order term on the RHS and solve for u. This will give the equa- 
tion of a line in polar coordinates. Substituting this solution on the RHS of 
Eq. (37.51) and solving the resulting equation yields the deviation from a 
straight line with a deflection angle of 


(37.52) 


where we have restored the G’s and the c’s in the last step. 
For a light ray grazing the Sun, b = Ro =7 x 10° m and M= Mo = 
2 x 10°° kg, so that Eq. (37.52) predicts a deflection of 1.747 seconds of 
arc. This bending of starlight passing near the Sun has been observed many 
times, beginning with the 1919 expedition led by Eddington. Because of the 
intrinsic difficulty of such measurements, these observations confirm GTR 
only to within 10 % accuracy. However, the bending of radio waves emitted 
by quasars has been measured to an accuracy of | %, and the result has been 
shown to be in agreement with Eq. (37.52) to within this accuracy. 
The last topic we want to discuss, a beautiful illustration of Proposi- 
tion 37.2.4 is the gravitational redshift. Let O; and O2 be two static ob- gravitational redshift 
servers (by which we mean that they each move on an integral curve of discussed 
the Killing vector field €). It follows that the 4-velocities u, and uy of the 
observers are proportional to €. Since u; and uz have unit lengths, we have 


gi 


uj = i=1,2. 


Va&i.éi) 


Suppose O; emits a light beam and Op? receives it. Since light travels on a 
geodesic, Proposition 37.2.4 gives 


g(u, §;) =g(u, €2), (37.53) 
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where u is tangent to the light trajectory (or the light signal’s 4-velocity). 
The frequency of light for any observer is the time component of its 4- 
velocity, and because the 4-velocity of an observer has the form (1, 0, 0, 0) 
in the frame of that observer, we can write this invariantly as 


Vv 9(;.€i) 


In particular, using Eq. (37.53), we obtain 


@1 g(u, uy) _ g(u,&1)//9(E1,€1) = V9(€2,&2) 7 {1—2M/r2 
@2 g(U,u2) g(u,é2)/,/g(—2,€2) 9 (E1,€1) 1—2M/r," 
where we used g(€,&) = goo = (1 — 2M /r) for the Schwarzschild space- 


time, and r; and ro are the radial coordinates of the observers O; and Op, 
respectively. In terms of wavelengths, we have 


MM 1-—2M/r, 
= : (37.54) 
2 1—2M/r2 


It follows from Eq. (37.54) that as light moves toward regions of weak 
gravity (r2 > r,), the wavelength increases (Az > 41), i.e., it will be “red- 
shifted”. This makes sense, because an increase in distance from the center 
implies an increase in the gravitational potential energy, and, therefore, a 
decrease in a photon’s energy iw. Pound and Rebka used the Méssbauer ef- 
fect in 1960 to measure the change in the wavelength of a beam of light as it 
falls down a tower on the surface of the Earth. They found that, to within the 
1 % experimental accuracy, the GTR prediction of the gravitational redshift 
was in agreement with their measurement. 


Oj = g(U, U;) = i=1,2. 


37.5 Problems 


37.1 Show that a derivative Vx defined by Eq. (37.2) satisfies the four con- 
ditions of Proposition 36.2.11. 


37.2 Using Eq. (36.6) show that the Levi-Civita connection satisfies 
Z(g(X, Y)) =9(VzX, Y) + g(X, VzY). 

Write down similar relations for X(g(Y, Z)) and Y(g(X, Z)). Now compute 
X(g(¥, Z)) + Y(g(X, Z)) — Z(g(X, Y)) 

and use the fact that Levi-Civita connection is torsion-free to arrive at (37.2). 

37.3 Derive Eq. (37.3) by letting X = 0;, Y= 0;, and Z = 0, in Eq. (37.2). 

37.4 Let A and B be matrices whose elements are one-forms. Show that 


(AA B)! = —Bi AA’. 
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37.5 Write Eq. (37.13) in component form and derive Eq. (37.15). 


37.6 Find d@ if 


~ 0 — cote? 
© \ cotbe? 0 : 


where (0, g) are coordinates on the unit sphere S$ 


37.7 Find the curvature of the two-dimensional space whose arc length is 
given by ds? = dx* + x*dy?. 


37.8 Find the curvature of the three-dimensional space whose arc length is 
given by ds? = dx? + x*dy* + dz’. 


37.9 Find the curvature tensors of the Friedmann and Schwarzschild spaces 
given in Example 37.1.8. 


37.10 Consider the Euclidean space R?. 


(a) Show that in this space, the composite operator d o « gives the curl of 
a vector when the vector is written as components of a two-form. 

(b) Similarly, show that « o d is the divergence operator for one-forms. 

(c) Use these results and the procedure of Example 37.1.9 to find expres- 
sions for the curl and the divergence of a vector in curvilinear coordi- 
nates. 


37.11 Prove the statement in Box 37.1.2. 


37.12 Start with d?x!/dt? = 0, the geodesic equations in Cartesian coor- 
dinates. Transform these equations to spherical coordinates (r, 6, gy) using 
x=rsin@cosg, y=rsiné@ sing, and z=rcos@, and the chain rule. From 
these equations read off the connection coefficients in spherical coordinates 
[refer to Eq. (36.44)]. Now use Eq. (36.33) and Definition 36.2.22 to evalu- 
ate the divergence of a vector. 


37.13 Find the geodesics of a manifold whose arc element is ds* = dx? + 
dy? + dz?. 


37.14 Find the geodesics of the metric ds* = dx? + x*dy?. 


37.15 Find the differential equation for the geodesics of the surface of a 
sphere of radius a having the line element ds* = a7d6? + a? sin* Odg’. 
Verify that 


Acosg+ Bsing + coté =0 


is the intersection of a plane passing through the origin and the sphere. Also, 
show that it is a solution of the differential equation of the geodesics. Hence, 
the geodesics of a sphere are great circles. 
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37.16 The Riemann normal coordinates are given by x! = a't. For each set 
of a! , one obtains a different set of geodesics. Thus, we can think of a! as 
the parameters that distinguish among the geodesics. 


(a) By keeping all a! (and fr) fixed except the jth one and using the defi- 
nition of tangent to a curve, show that n; = td;, where n; is (one of) 
the n(’s) appearing in the equation of geodesic deviation. 

(b) Substitute (a) plus ui =x'=a' in Eq. (37.29) to show that 

Re sik + R” ix => caer + Pei) 


Substitute for one of the I’’s on the RHS using Eq. (36.45). 
(c) Now use the cyclic property of the lower indices of the curvature ten- 
sor to show that 


1 
are —_ —3 (Ri + R" jig): 


37.17 Let w be a |-form and v a vector. 


(a) Show that the covariant and Lie derivatives, when applied to a 1-form, 
are related by 

(Lu®, V) = (Wu®, Vv) + (@, Vyu). 
(b) Use this to derive the identity 

(Lu) = (Vu); +0j(Ve uy’. 
37.18 Show that Eq. (37.19) leads to Eq. (37.20). 
37.19 Show that a vector field that generates a conformal transformation 
satisfies 

X* Ok Bij + OX" gay + Oj X* Sei = —Wei- 


37.20 Use the symmetries of R; jx; [Eqs. (37.14) and (37.15)] to show that 
Rijxt = Reij and Ry jx = 9. 


37.21 Use the symmetry properties of Riemann curvature tensor to show 
that 


(2) Rig =Ryij’ =0, and 
(b) Rij = Rj. 
(c) Show that R igi + R jx1 — R ji. = 90, and conclude that V -G = 0, or, 


in component form, G * p= 0, 


37.22 Show that in an n-dimensional manifold without metric the number 
of independent components of the Riemann curvature tensor is 
nm(n—1) n(n—1)\—2)_ n(n? -1) 
2 6 a Sea 
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If the manifold has a metric, the number of components reduces to 


nn] Wn—D)M—2)_ n(n? -1) 
2 6 ~~ ee 


37.23 Consider Einstein’s equation R — 5Rg =«T. 


(a) Take the trace of both sides of the equation to obtain R = —x T = 
—KT. 

(b) Use (a) to obtain Roo = 3« (Too + T/). 

(c) Now use the fact that in Newtonian limit 7;; < Too © p to con- 
clude that agreement of Einstein’s and Newton’s gravity demands that 
« = 87 in units in which the universal gravitational constant is unity. 


37.24 Let E;; be the most general second-rank symmetric tensor con- 
structed from the metric and Riemann curvature tensors that is linear in the 
curvature tensor. 


(a) Show that 
Ej; =aRij + bgijR+ Agij, 


where a, b, and A are constants. 

(b) Show that £;; has a vanishing divergence if and only if b = — 5a. 

(c) Show that in addition, E;; vanishes in flat space-time if and only if 
A=0. 


37.25 Show that Ry and R_, as given by Eq. (37.48) are, respectively, a 
minimum and a maximum of V(r). 


37.26 Use Newtons second law of motion to show that in a circular orbit of 
radius R, we have L? = GMR. 


37.27 Show that Ri > 6M and 3M < R_ < 6M, where Rx are given by 


Eq. (37.48). 
37.28 Calculate the energy of a circular orbit using Eq. (37.46), and show 
that 
R-2M 
LR) = 
R2—3MR 
where R= Ry. 


37.29 Show that the radial frequency of oscillation of a massive particle in 
a stable orbit of radius R+ is given by 


2_ M(R, — 6M) 
"RY -3MRy 


37.30 Derive Eq. (37.52) from Eq. (37.51). 
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energy momentum, 1071 
formula for, 1111 
Curvature, 1125-1132 
abelian case, 1095 
and gravity, 1161 
as relative acceleration, 1160 
matrix structure group, 1096, 1097 
Curvature form, 1093 
principal fiber bundle, 1091-1097 
structure equation, 1093 
Curvature scalar, 1163 
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Curvature tensor field, 1126 
Curvature transformation, 1126 
Curve 
coordinate, 870 
development of, 1137 
differentiable, 866 
Curvilinear coordinates, 1150 
Cyclic permutation, 717 
Cyclic subgroup, 707 


D 
D’ Alembert 
biography, 397 
D’ Alembert, 1057 
Damping factor, 447 
Darboux, 799, 1015 
Darboux inequality, 313 
Darboux theorem, 902 
De Broglie, 907 
Decomposition 
algebra, 83-95 
Clebsch-Gordan, 753-756 
Dedekind, 11, 764, 792, 1130 
Degeneracy 
energy, 656 
lifting of, 748 
Degenerate eigenvectors, 402 
Degenerate kernel, 556-559 
Delta function, 229, 512, 624 
derivative of, 233 
expansion 
Fourier, 273 
general, 257 
Fourier transform, 279 
Green’s function, 644, 685 
integral representation of, 281 
Legendre polynomials, 256 
limit of sequence, 229, 231 
potential, 653 
spherical harmonics, 692 
step function, 231, 232 
Sturm-Liouville eigenfunctions, 691 
variational problem, 1054 
Dense subset, 520 
Density function, 936 
Density of states, 584 
Derivation, 81, 82, 99, 106, 887, 891, 944, 
1124 
of an algebra, 80-83 
tangent vector, 868 
Derivation algebra, 82, 944 
Derivative 
complex function, 315-319 
covariant, 1117 
function of operator, 108 
functional, 1050-1053 
Hilbert spaces, 1047-1050 
of operators, 107-112 
total, 1027 
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Derivative operator, 40 
unboundedness of, 515 
Derived algebra, 66, 98 
Descartes, 791 
Determinant, 7, 55, 56, 118, 153-155, 
158, 160-162, 173, 175, 201, 
205, 557, 558, 567, 610, 641, 
644, 661, 706, 719, 788, 800, 
801, 806, 816-818, 839, 897, 
898, 916, 924 
analytic definition of, 205 
connection with trace, 161 
derivative of, 161 
exponential of trace, 162 
minor, 153 
relation to trace, 160 
Determinant function, 54—56, 152, 158, 
159, 162, 167, 799, 805, $15, 
838, 848 
dual, 158-160 
normed, 815 
Determinant of a matrix, 151-160 
Development, 1137 
Diagonalization 
simultaneous, 185-188 
Diffeomorphism, 865 
Differentiable curve, 866 
tangent vector, 868 
Differentiable manifold, 859-866 
dimension of a, 860 
Differentiable map, 864 
coordinate expression of, 864 
Differential 
of a constant map, 873 
of a map, 872 
real-valued maps, 874 
Differential equation 
analytic, 460 
analytic properties, 460-463 
associated Legendre, 411 
Bessel, 466 
completely homogeneous problem, 
612 
Euler, 471 
Fuchsian, 469-473 
definition, 470 
homogeneous, 418 
hypergeometric, 466 
definition, 473 
inhomogeneous, 418 
Legendre, 411 
linear, 418 
superposition principle, 423 
multiparameter symmetry group, 
1040-1043 
Riemann, 471 
second order linear 
behavior at infinity, 469 
second-order linear 
Frobenius method, 440 
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regular, 422 
symmetry group, 1014-1024 
Differential form, 888 
closed, 894 
exact, 894 
Lorentz force law, 892 
Maxwell’s equations, 890 
pullback of, 888 
Differential geometry, 1117-1140 
Differential one-form, 882 
Differential operator, 418, 970 
adjoint, 433-436 
linear, 605 
Diffusion equation, 643, 673 
one-dimensional 
parabolic, 642 
time-dependent, 581, 582 
Dilation, 306 
Dilitation, 1159 
Dimension theorem, 42, 44, 45, 61, 99, 
193, 518, 803, 810, 857, 876 
Dirac, 957 
biography, 235 
Dirac delta function, 229, 512, 624 
derivative of, 233 
expansion 
Fourier, 273 
general, 257 
Fourier transform, 279 
Green’s function, 644, 685 
integral representation of, 281 
Legendre polynomials, 256 
limit of sequence, 229 
spherical harmonics, 692 
step function, 231 
Sturm-Liouville eigenfunctions, 691 
variational problem, 1054 
Dirac equation, 832-834, 997 
Dirac gamma matrices, 834 
Majorana representation, 855, 1003 
Direct product 
group, 712, 713 
Direct sum, 25—28, 75, 92, 119, 169, 201, 
528, 558, 567, 712, 731, 797, 
803, 810, 834-836, 840, 852, 
947, 959, 999, 1001, 1087 
algebra, 67 
definition, 25 
inner product, 32 
Directional covariant derivative, 1119 
Directional derivative, 884 
Dirichlet, 246, 791, 1130, 1144 
biography, 666 
Dirichlet boundary condition, 642 
Dirichlet BVP, 642, 665-671 
in two dimensions, 690 
Discrete Fourier transform, 286, 287 
Dispersion relation, 376-378 
with one subtraction, 377 
Distribution, 234, 418, 686, 688 
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Distribution (cont.) 
density, 234 
derivative of, 236 
Fourier transform, 287, 288 
Fourier transform of a, 288 
Green’s function as, 606 
limit of functions, 235 
Divergence 
null Lagrangians, 1060, 1061 
of tensors, 1135 
total, 1060 
Divergence theorem, 648 
Division algebra, 69 
of a Clifford algebra, 999 
DOLDE 
hypergeometric 
Kummer’s solutions, 477 
Domain, 5 
Dot product, 7, 29 
Dual 
basis, 51, 782 
of an operator, 51 
space, 48 
Dual determinant function, 158-160 
Dual space, 49 


E 
Effective action, 713 
Eigenfunction expansion technique 
2D Laplacian, 689 
Eigenspace, 173, 180 
compact operator, 527 
compact resolvent operator, 565 
involution, 836 
normal operator, 179 
perturbation theory, 655 
Weyl operator, 955 
Eigenvalue, 172-175 
angular momentum, 401-405, 970 
Casimir operator, 969 
characteristic polynomial, 173 
circuit matrix, 462 
compact operators, 527 
definition, 172 
discrete, 630 
extrema of functions, 197 
Green’s functions, 630, 688 
harmonic oscillator, 444 
hermitian operator, 178 
integral equation, 544 
invertible operator, 611 
involution, 836 
largest, 181 
orthogonal operator, 201 
perturbation theory, 656 
positive operator, 181 
projection operator, 174 
simple, 173 
smallest, 181 
Sturm-Liouville, 691 
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Sturm-Liouville system, 568, 578 
unitary operator, 178 
upper-triangular matrix, 175 
Weyl operator, 959 
Eigenvector, 172-175, 178, 199 
angular momentum, 402, 406-413 
Casimir operator, 969 
compact normal operator, 532 
compact operators, 527 
definition, 172 
harmonic oscillator, 444 
hermitian operator, 433 
infinite dimensions, 518 
integral equation, 554 
normalized, 190 
perturbation theory, 656 
simultaneous, 185 
SOLDE, 463 
Sturm-Liouville system, 567 
Weyl operator, 959 
Einstein, 897, 956, 1070, 1131, 1145, 
1146, 1164, 1166 
Einstein tensor, 1163 
Einstein’s equation, 1163-1166 
Schwarzschild solution, 1169 
spherically symmetric solutions, 
1167-1169 
Einstein’s summation convention, 781 
Electromagnetic field tensor, 145, 826, 
889, 892, 893, 895 
Elementary column operation, 156 
Elementary row operation, 156 
Elliptic PDE, 641, 665-673 
Elsewhere, 941 
Empty set, 2 
Endomorphism, 39, 40, 42, 80, 81, 101, 
102, 125, 164, 705, 806, 807, 
1126 
involution, 72 
Energy function, 905 
Energy levels, 11 
Energy quantum number, 727 
Entire function, 301, 343 
Bessel functions, 572 
bounded, 317 
confluent HGF, 479 
inverse of gamma function, 379 
with simple zeros, 364 
Epimorphism 
algebra, 70 
Equivalence class, 3 
representative, 3 
Equivalence relation, 3, 4, 24 
Equivalent representations, 726 
Error function, 436 
as solution of a DE, 437 
Essential singularity, 342 
Essentially idempotent, 741, 772 
n-orthogonal matrices, 940 
Euclid, 220, 907 
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Euclidean metric, 1149 
Euler, 301, 474, 482, 570, 1057, 1144 
biography, 1055 
Euler angles, 146, 172, 934, 972 
Euler equation, 457 
Euler kernel, 494 
Euler operator, 1054 
Euler theorem, 973 
Euler transform, 493 
Euler-Lagrange equation, 1055, 1069 
classical, 1053 
field, 1053 
Euler-Mascheroni constant, 380 
Evaluation function, 1051 
Event, 941 
Evolution operator, 109, 678 
Exact form, 894 
Expectation value, 115 
Exponential function 
complex, 302 
Exponential map, 925 
Exterior algebra, 794-801 
Exterior calculus, 888-897 
Exterior covariant derivative, 1093 
Exterior derivative, 889 
covariant, 1093 
Exterior product, 794 
inner product, 819, 820 


F 
F-related vector fields, 877 
Factor algebra, 77, 78, 92, 709 
Factor group, 710 
Factor map, 6 
Factor set, 4, 24 
Factor space, 24, 25, 77 
Factorial function, 378 

Stirling approximation of, 386 
Faithful representation, 126, 726 
Fast Fourier transform, 287 
Fermi energy, 584 
Feynman diagram, 654 
Feynman propagator, 688 
Fiber, 1080 
Fiber bundle, 1079-1097 

abelian case, 1095 

principal, 1079-1086 
Fiber metric, 1143 


Field, 20 
gauge, 1099-1105 
magnetic, 3 
particle, 1101 
tensor 
manifold, 876-888 
vector 


manifold, 877-882 
Fine-structure constant, 481, 655 
Finite-rank operator, 524 
First integral, 1066 
First variation, 1057 
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Flat connection, 1095, 1096 
Flat manifold, 1153, 1154 
Flat map, 801, 902 
Flow, 881 
FODE 
existence, 419-421 
existence and uniqueness 
local, 421 
linear, 420 
normal form, 420 
Peano existence theorem, 420 
uniqueness, 419-421 
uniqueness theorem, 420 
FOLDE, 433 
complex, 460-462 
irregular singular point, 461 
regular singular point, 461 
removable singularity, 461 
Form 
invariant 
Lie group, 927, 928 
pseudotensorial, 1092 
tensorial, 1092 
torsion, 1122 
Form factor, 284 
Formal adjoint, 612, 633, 648 
Four-potential, 895 
Four-vector, 808 
energy momentum, 1071 
Fourier, 666, 703, 1056 
biography, 267 
Fourier integral transforms, 278 
Fourier series, 265-276, 563 
angular variable, 266 
fundamental cell, 267 
general variable, 268 
group theory, 960 
higher dimensions, 275, 276 
main theorem, 272 
Peter-Weyl, 960 
sawtooth, 270 
square wave, 269 
to Fourier transform, 276-278 
two-dimensional, 581 
Fourier transform, 276-288, 493 
Coulomb potential 
charge distribution, 283 
point charge, 282 
definition, 278 
derivatives, 284, 285 
discrete, 286, 287 
distribution, 287, 288 
Gaussian, 280 
Green’s functions, 680-688 
higher dimensions, 281 
quark model, 284 
scattering experiments, 282 
Fourier-Bessel series, 587 
Fredholm, 220 
Fredholm, biography, 551 
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Fredholm alternative, 551 
Fredholm equation, 543, 652 
second kind 
characteristic values, 544 
Fredholm integral equation, 549-559 
Free action, 713 
Friedmann, biography, 1166 
Friedmann metric, 1149 
Frobenius, 734, 957, 981 
biography, 764 
Frobenius method, 439-444 
Frobenius Theorem, 93 
Fuchsian DE, 469-473 
definition, 470 
Function, 5 
analytic, 297-304 
complex, 295, 296 
derivatives as integrals, 315-319 
integration, 309-315 
determinant, 54 
generalized, 233-237 
inner product, 32 
meromorphic, 363-365 
multivalued, 365-371 
of operators, 104-106 
operator, 188-191 
p-linear, 53 
piecewise continuous, 266 
square-integrable, 221-227 
Function algebra, 67 
Function of operator 
derivative, 108 
Functional, 1054 
linear, 48-53 
Functional derivative, 1050-1053 
Fundamental theorem of algebra, 318 
Fundamental vector field, 1086 
Future light cone, 941 


G 
G-invariance, 1009 
G-invariant Lagrangian, 1106 
g-orthogonal, 808 
g-orthonormal, 813 
g-transpose, 806 
Galois, 154, 764, 946, 1015 
biography, 702 
Gamma function, 250, 378-381 
definition, 378 
Gamma matrices, 834 
Majorana representation, 855 
Gauge 
choice of, 1099 
Gauge field, 1099-1105 
Gauge invariance, 895 
Gauge Lagrangian, 1105 
Gauge Lagrangian density, 1109 
Gauge potential, 1099-1105 
Gauge theories, 1099-1114 
Gauge theory 
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local equation, 1112-1114 
Gauge transformation, 1102 
Gauss, 154, 251, 301, 482, 523, 533, 666, 
791, 895, 1055, 1130, 1144 
biography, 474 
Gay-Lussac, 581 
Gegenbauer function, 478 
Gegenbauer polynomials, 253 
General linear group, 705 
representation, 963-966 
General relativity, 1163-1174 
Generalized Fourier coefficients, 220 
Generalized function, 233-237, 418, 606, 
688 
Generalized Green’s identity, 613, 626, 
648 
Generating function, 257 
Generator 
Clifford algebra, 997, 1000 
conformal group, 1036, 1159 
coordinate transformation, 934 
cyclic group, 710 
group, 933, 1113 
group action, 962 
infinitesimal, 929, 932, 970, 976, 
1010-1013, 1023, 1030, 1037, 
1040, 1041, 1062, 1064, 1068, 
1069 
Lorentz, 1073 
of an algebra, 70 
rotation, 106, 750, 933, 998, 1157 
translation, 112, 948 
Geodesic, 1137-1140 
relative acceleration, 1160 
Geodesic deviation, 1159-1163 
equation of, 1161 
Geodesic equation, 1138 
massive particles, 1170, 1171 
massless particles, 1170, 1172 
Geometric multiplicity, 173 
Geometry 
Riemannian, 1143-1174 
symplectic, 51, 901-909 
Gibbs, 523, 907 
Gibbs phenomenon, 273-275 
GL(n, R) as a Lie group, 916 
GL(V) 
as a Lie group, 915 
representation of, 963 
Gédel, 897 
Gordan, 1070 
biography, 755 
Gradient 
for Hilbert spaces, 1050 
Gradient operator, 1133 
Gram, 34 
Gram-Schmidt process, 33-35, 164, 210, 
241, 532 
Graph, 5 
Grassmann, 799, 1070 
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Grassmann product, 794 
Gravitational red-shift, 1173 
Gravity 
and curvature, 1161 
Newtonian, 1161-1163 
Green, biography, 613 
Green’s function, 358 
adjoint, 618 
advanced, 686 
as a distribution, 606 
Dirichlet BC 
circle, 670 
eigenfunction expansion, 630-632 
for d/dx, 606 
for d?/dx*, 607 
formal considerations, 610-617 
Helmholtz operator 
in 2D, 694 
in one dimension, 605 
indefinite, 606-610 
multidimensional 
delta function, 643-648 
diffusion operator, 684, 685 
Dirichlet BVP, 665-671 
eigenfunction expansion, 688-693 
Fourier transform, 680-688 
fundamental solution, 649-651 
general properties, 648, 649 
Helmholtz operator, 682-684 
integral equations, 652-655 
Laplacian, 647, 648, 681, 682 
Neumann BVP, 671-673 
perturbation theory, 655-661 
wave equation, 685-688 
Neumann BVP, 673 
exterior, 673 
interior, 673 
physical interpretation, 629 
properties, 619 
regular part of, 651 
resolvent, 630 
retarded, 686 
second order DO, 614-616 
self-adjoint SOLDOs, 616, 617 
singular part of, 651 
SOLDO, 617-629 
construction, 621-626 
inhomogeneous BCs, 626-629 
properties, 619-621 
uniqueness, 621-626 
symmetry, 619 
Green’s identity, 619, 648, 675, 679 
generalized, 648 
Group, 8, 702-705 
Ist isomorphism theorem, 710 
abelian, 704 
affine, 948 
algebra 
symmetric group, 771 
automorphism, 705 
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center of, 708 
commutative, 704 
commutator of, 707 
direct product, 712, 713 
external, 712 
internal, 712 
external direct product, 712 
finite 
Lagrange’s theorem, 721 
homomorphism, 705 
kernel of, 708 
internal direct product, 712 
isomorphism, 705 
left action, 713 
Lie, 915-936 
multiplication, 702 
multiplication table, 705 
of affine motions, 917 
order of, 703 
orthogonal, 706 
realization, 715 
representation, 725-732 
character table, 743 
criterion for irreducibility, 738 
crystallography, 727 
irreducible, projection operator, 
749 
irreducible basis function, 746-750 
matrix, 727 
particles and fields, 751 
quantum state parity, 727 
tensor product, 750-758 
right action, 713 
rigid rotations, 706 
simply reducible, 753 
special orthogonal, 706 
special unitary, 706 
subset 
left invariant, 713 
right invariant, 713 
word on, 720 
symmetric, 715-720 
symmetry of Hamiltonian, 725 
symplectic, 707, 803 
unitary, 706 
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effective, 713, 918 

free, 713, 918 
infinitesimal, 928-935 
infinitesimal generator, 929 
Lie groups, 917-920 

orbit, 713 

stabilizer, 713 

transitive, 713,918 


Group algebra, 740 


representations, 740-743 
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TE, 585 
TEM, 585 
TM, 585 
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H 
Haar measure, 935 
Halley, 481 
Hamilton, 246, 545, 1070 
biography, 906 
Hamiltonian 
group of symmetry of, 725 
Hamiltonian mechanics, 801, 904 
Hamiltonian system, 905 
Hamiltonian vector field, 905 
energy function, 905 
Hankel function, 484 
first kind 
asymptotic expansion of, 386 
second kind, 391 
Hankel transform, 494 
Harmonic functions, 304 
Harmonic oscillator, 443, 444-446 
critically damped, 447 
ground state, 444 
Hamiltonian, 444 
overdamped, 447 
underdamped, 447 
Heat equation, 395, 643, 673 
symmetry group, 1030-1034 
Heat transfer 
time-dependent, 581, 582 
Heat-conducting plate, 597 
Hegel, 791 
Heisenberg, 115, 236 
Helicity, 982 
Helmholtz, 246, 639, 957 
Helmholtz equation, 593 
Hermite, 251, 896 
biography, 115 
Hermite polynomials, 245, 248, 249, 442, 
573 
Hermitian, 31, 48, 116, 117, 120, 144, 
147, 172, 177, 178, 181, 186, 
189, 205, 402, 525, 533, 555, 
558, 564, 613, 924, 945, 955, 
968, 982 
Hermitian conjugate, 113-116, 144, 146, 
162, 171, 202, 404, 513, 661 
Hermitian inner product, 31 
Hermitian kernel, 552-556 
Hermitian operator, 114-119 
Hilbert, 11, 34, 268, 523, 755, 897, 956, 
1070, 1164 
biography, 220 
Hilbert space, 215-227, 435 
basis of, 219 
bounded operators in, 513 
compact hermitian operator in, 530 
compact normal operator in, 532 
compact operator in, 524 
compact resolvent, 564 
convex subset, 528 
countable basis, 228 
definition, 218 


1191 


derivative, 1047-1050 
differential of functions, 1049 
directional derivative, 1050 
functions on, 1052 
derivative of, 1049 
invertible operator in, 611 
operator norm, 513 
perturbation theory, 658 
representation theory, 726, 953 
square-integrable functions, 222 
Hilbert transform, 377 
Hilbert-Schmidt kernel, 525, 549 
Hilbert-Schmidt operator, 525, 955 
Hilbert-Schmidt theorem, 552 
HNOLDE, 446, 448 
characteristic polynomial, 446 
Hodge star operator, 820-823, 893 
Hdlder, 523 
Homographic transformations, 307 
Homomorphism 
algebra, 70, 71, 77, 82, 98, 125 
Clifford algebra, 837 
Clifford group, 991 
group, 705, 710, 726, 731, 732, 987, 
902 
Lie algebra, 922, 944, 953, 1101 
Lie group, 915, 922, 928, 953, 967 
PFB, 1081 
symmetric, 705 
trivial, 705 
Horizontal lift, 1089 
Horizontal vector field, 1087 
HSOLDE 
basis of solutions, 425 
comparison theorem, 431 
exact, 433 
integrating factor, 433 
second solution, 426-428 
separation theorem, 430 
Hydrogen, 11 
Hydrogen-like atoms, 480-482 
Hyperbolic PDE, 641, 678-680 
Hypergeometric DE, 466 
Hypergeometric function, 473-478 
confluent, 478-485 
integral representation of, 497, 498 
contiguous functions, 476 
Euler formula, 496 
integral representation of, 494-498 
Hypergeometric series, 473 
Hypersurface, 635 
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Ideal, 73-78 

Idempotent, 83, 86-89, 119-125, 175, 

TA1, 844, 852, 999, 1002 

essentially, 741, 772 
primitive, 88, 94, 999, 1001, 1002 
principal, 87-89 
rank, 94 
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Identity 

additive, 20 

multiplicative, 20 
Identity map, 5 
Identity operator, 101 
Identity representation, 726 
Ignorable coordinate, 645 
Image 

map, 5 
Image of a subset, 5 
Implicit function theorem, 419 
Index 

continuous, 227-233 
Indicial equation, 465 

SOLDE, 465 
Indicial polynomial, 465 
Induced representations, 978 
Induction principle, 12 
Inductive definition, 14 
Inequality 

Bessel, 219 

Cauchy, 336 

Darboux, 313 

Parseval, 219 

Schwarz, 35 

triangle, 36 
Infinitesimal action 

adjoint, 929 


Infinitesimal generator, 929, 932 


Initial conditions, 418 
Initial value problem, 611, 635 
Injective map, 5 
Inner automorphism, 926 
Inner product, 29-38, 804-820 
bra and ket notation, 31 
complex bilibear, 46 
definition of, 30 
direct sum, 32 
Euclidean, 31 
exterior product, 819, 820 
G-orthogonal, 1107 
hermitian, 31 
indefinite 


orthonormal basis, 812-819 


subspaces, 809-812 
isotropic vector, 808 
norm and, 37 
null vector, 808 
positive definite, 30 
pseudo-Euclidean, 31 
sesquilinear, 31 
signature, 813 

Inner product space, 31 
INOLDE 
particular solution, 448 
Integral 
principal value, 354-358 
Integral curve, 879 
Integral equation, 543-548 
characteristic value, 544 


Index 


first kind, 543 
Fredholm, 549-559 
Green’s functions, 652-655 
kernel of, 543 
second kind, 543 
Volterra, 543 
Volterra, of second kind 
solution, 545 
Integral operator, 512 
Integral transform, 493 
Bessel function, 494 
Integration 
complex functions, 309-315 
Lie group, 935, 936 
manifolds, 897-901 
Integration operator, 40 
Interior product, 829, 891 
Intersection, 2 
Intrinsic spin, 1073 
Invariant, 1010 
map, 1010 
operator 
matrix representation, 171 
subspace, 169-172 
definition, 170 
Invariant subspace, 728, 729 
Inverse 
image, 5 
of a map, 6 
of a matrix, 155-158 
Inverse mapping theorem, 873 
Inversion, 154, 306, 1159 
Involution, 72, 82, 836, 837, 839, 843, 
848, 989, 990 
Irreducible basis function, 746-750 
Irreducible representation, 729 
i-th row 
functions, 747 
norm of functions, 747 
Irreducible set of operators, 757 
Irreducible tensor operators, 756-758 
Irreducible tensorial set, 757 
Isolated singularity, 342-344 
Isolated zero, 330 
ISOLDE 
general solution, 428-430 
Isometric map, 39, 1155 
Isometry, 40, 42, 43, 125, 205, 539, 806, 
807, 811, 826, 992, 1143, 
1155-1159, 1168 
time translation, 1167 
Isomorphism, 43-45, 52, 68, 74, 78, 127, 
139, 140, 158, 222, 228, 661, 
704, 719, 721, 726, 789, 796, 
801, 838, 845, 847, 851, 871, 
872, 884, 905, 921, 922, 926, 
930, 945, 972, 998, 1085-1087, 
1089, 1103, 1118, 1128, 1143 
algebra, 70 
Clifford algebras, 842, 843 
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Isomorphism (cont.) 
group, 705 
Lie algebra, 922 
Lie group, 915 
linear, 43-45 
natural, 785 
PFB, 1081 

Isotropic vector, 808 


J 
Jacobi, 251, 475, 545, 666, 713, 753, 755, 
791, 907, 1144 
biography, 246 
Jacobi function 
first kind, 477 
second kind, 478 
Jacobi identity, 879, 887, 927 
Jacobi polynomials, 245, 250, 252, 478 
special cases, 245 
Jacobian matrix, 873 
Jordan arc, 309 
Jordan canonical form, 539 
Jordan’s lemma, 345 


K 
Kant, 791 
Kelvin, 613 
Kelvin equation, 589 
Kelvin function, 589 
Kepler problem, 1074 
Kernel, 41, 42, 51, 130, 158, 173, 192, 
198, 498, 529, 546, 558, 560, 
635, 678, 708, 826, 937, 944, 
995, 999 
degenerate, 556-559 
hermitian, 552-556 
Hilbert-Schmidt, 525, 544, 555 
integral operator, 512 
integral transforms, 493 
separable, 556 
Ket, 20 
Killing, 799, 1015 
biography, 946 
Killing equation, 1156 
Killing form, 945, 948 
of gl(n, R), 947 
Killing parameter, 1167 
Killing vector field, 1155-1159, 1167, 
1170, 1173 
conformal, 1158 
Kirchhoff, 639 
Klein, 799, 896, 956, 1015, 1070, 1131, 
1164 
Klein-Gordon equation, 396 
Korteweg-de Vries equation, 1044 
Kovalevskaya, 523 
biography, 639 
Kramers-Kronig relation, 378 
Kronecker, 11, 154 
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biography, 791 
Kronecker delta, 32, 50, 161, 782, 939 
Kronecker product, 751 
Kummer, 36, 755, 791, 946, 1130 


L 
Lagrange, 154, 246, 251, 267, 474, 482, 
581, 755 
biography, 1057 
Lagrange identity, 435, 494, 570, 578, 
613, 805 
Lagrange multiplier, 1064 
Lagrange’s equation, 1111 
Lagrangian, 904, 1054 
G-invariant, 1106 
gauge, 1109 
gauge-invariant, 1105-1107 
construction, 1107-1111 
null, 1060, 1061 
Lagrangian density, 1105 
Laguerre polynomials, 245, 249, 250 
Laplace, 34, 267, 666, 906, 1056, 1058, 
1164 
biography, 581 
Laplace transform, 493 
Laplace’s equation, 395 
Cartesian coordinates, 579 
cylindrical coordinates, 586 
elliptic, 642 
Laplacian 
Green’s function for, 647 
separated 
angle radial, 399 
spherical coordinates 
separation of angular part, 398-401 
Laurent, biography, 340 
Laurent series, 321-330, 657 
construction, 322 
uniqueness, 325 
Lavoisier, 153, 1058 
Least square fit, 225-227 
Lebesgue, 221 
Left annihilator, 73 
Left coset, 708 
Left ideal, 73, 74, 84, 740, 773, 1000, 
1002 
minimal, 74, 76, 79, 94, 128, 129, 772, 
999, 1001, 1003 
Left translation 
as action, 929 
Left-invariant 1-form, 921 
Left-invariant vector field, 920 
Legendre, 246, 267, 545, 666, 1144 
biography, 251 
Legendre equation, 436, 441, 572 
Legendre function, 478 
Legendre polynomial, 225, 250-252, 256, 
408, 411, 428, 555 
and Laplacian, 256 
asymptotic formula, 576 


1194 


delta function, 256 
Legendre transformation, 904 
Leibniz, 154, 791 
Leibniz formula, 81 
Leibniz rule, 16 
Length 

vector, 36-38 
Levi-Civita, 1131 

biography, 1146 
Levi-Civita connection, 1145 
Levi-Civita tensor, 799, 976 
Lie, 764, 799, 896, 946 

biography, 1014 
Lie algebra, 915-936 

abelian, 937 

adjoint map, 926 

Cartan metric tensor, 945 

Cartan theorem, 948 

center, 937 

commutative, 937 

compact, 945 

decomposition, 947 

derivation, 944 

ideal, 937 

Killing form of, 945 

of a Lie group, 920-927 

of SL(V), 924 

of unitary group, 924 

of vector fields, 879 

representation, 966-983 

definition, 953 

semisimple, 948 

simple, 948 

structure constants, 937 

theory, 936-948 
Lie bracket, 879 
Lie derivative, 885 

covariant derivative, 1135 

of a 1-form, 886 

of p-forms, 890 

of vectors, 886 
Lie group, 405, 915-936 

canonical 1-form on, 928 

compact 

characters, 960 


matrix representation, 959 


representation, 953-963 


unitary representation, 954 


Weyl operator, 955 
group action, 917-920 
homomorphism, 915 


infinitesimal action, 928-935 


integration, 935, 936 
density function, 936 

invariant forms, 927, 928 
left translation, 920 
local, 917 
representation, 953 

Lie multiplication, 937 

Lie subalgebra, 937 
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Lie’s first theorem, 932 
Lie’s second theorem, 927 
Lie’s third theorem, 927 
Light cone, 941 
Linear combination, 21 
Linear connection, | 120-1140 
definition, 1121 
Linear frame, 1083 
Linear functional, 48-52, 53, 53, 61, 233, 
234, 287, 515, 617, 783, 787, 
796, 809, 829, 883 
Linear independence, 21 
Linear isomorphism, 43-45, 49 
Linear map, 38-45, 51, 70, 78, 95, 116, 
563, 789, 801, 814, 837, 838, 
840, 856, 1048, 1049, 1073 
invertible, 43 
Linear operator, 39-41, 47, 55, 56, 66, 
113, 115, 116, 119, 139, 140, 
151, 170, 171, 174, 422, 513, 
515, 517, 522, 529, 531, 564, 
785, 793, 799, 810, 944 
determinant, 55, 56 
null space of a, 41 
Linear PDE, 636 
Linear transformation, 53 
bounded, 514 
definition, 39 
pullback of a, 51 
Liouville, 568, 703 
biography, 570 
Liouville substitution, 569, 573, 576, 577 
Liouville’s theorem, 908 
Lipschitz condition, 420 
Little algebra, 978 
Little group, 714, 978-981 
Local diffeomorphism, 865 
Local group of transformations, 917 
Local Lie group, 917 
Local operator, 512 
Local trivialization, 1080 
Logarithmic function, 365 
Lorentz, 897 
Lorentz algebra, 972 
Lorentz force law, 892 
Lorentz group, 707, 940 
Lorentz metric, 1149 
Lorentz transformation, 940 
orthochronous, 941 
proper orthochronous, 941 
Lowering indices, 805 
Lowering operator, 403 


M 
Maclaurin series, 321 
Magnetic field, 3 
Manifold, 859-866 
atlas, 860 
chart, 860 
coordinate functions, 860 
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Manifold (cont.) 
differentiable, 859-866 
differential of a map, 872-876 
flat, 1153 
integration, 897-901 
orientable, 898 
product, 863 
pseudo-Riemannian, 1144 
Riemannian, 1144 
semi-Riemannian, 1144 
subset 
contractable to a point, 894 
symplectic, 902 
tangent vectors, 866-872 
tensor fields, 876-888 
vector fields, 877-882 
with boundary, 899 
Map, 4-8 
bijective, 6 
codomain, 5 
conformal, 304—309 
differentiable, 864 
differential 
Jacobian matrix of, 873 

domain, 5 

equality of, 5 

functions and, 5 

graph of a, 5 

identity, 5 

image of a subset, 5 

injective, 5 

inverse of a, 6 

isometric, 39 

linear, 38-45 
invertible, 43 

manifold, 872-876 

multilinear, 53-57, 782-789 
skew-symmetric, 53 

one-to-one, 5 

onto, 6 

p-linear, 53 

range of a, 5 

surjective, 5 

target space, 5 

Maschke’s Theorem, 759 

Mathematical induction, 12-14 

Matrix, 137-142 
antisymmetric, 144 
basis transformation, 149 
block diagonal, 171, 200 
circuit, 462, 463 
complex conjugate of, 144 
determinant of, 151-160 
diagonal, 144 
diagonalizable, 162 
hermitian, 144 
hermitian conjugate of, 144 
inverse of, 155-158 
irreducible, 171 
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operations on a, 142-146 
orthogonal, 144 
rank of, 158 
reducible, 171 
representation 
orthonormal basis, 146-148 
row-echelon, 156 
strictly upper triangular, 66 
symmetric, 144 
symplectic, 804 
transpose of, 142 
triangular, 156 
unitary, 144 
upper triangular, 66 
upper-triangular, 175, 176 
Matrix algebra, 66, 78-80 
Matrix of the classical adjoint, 152-155 
Maurer-Cartan equation, 928, 1095 
Maximally symmetric spaces, 1157 
Maxwell’s equations, 894 
Mellin transform, 493 
Mendelssohn, 666, 792 
Meromorphic functions, 363-365 
Method of images, 668 
sphere, 669 
Method of steepest descent, 383, 577 
Metric, 37 
Friedmann, 1149 
Schwarzschild, 1149 
Metric connection, 1143-1155 
Metric space, 8-10 
complete, 10 
convergence, 9 
definition, 8 
Minimal ideal, 963 
Minimal left ideal, 74, 76, 79, 94, 128, 
129, 772, 999, 1001, 1003 
Minkowski, 1164 
Minkowski metric, 1149 
Mittag-Leffler, 523, 640 
Mittag-Leffler expansion, 364 
Modified Bessel function, 484 
first kind 
asymptotic expansion of, 391 
second kind 
asymptotic expansion of, 392 
Moment of inertia, 145, 195 
matrix, 145 
Momentum operator, 398 
Monge, 153, 267 
Monomorphism 
algebra, 70 
Morera’s theorem, 319 
Multidimensional diffusion operator 
Green’s function, 684, 685 
Multidimensional Helmholtz operator 
Green’s function, 682-684 
Multidimensional Laplacian 
Green’s function, 681, 682 
Multidimensional wave equation 
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Green’s function, 685-688 
Multilinear, 152, 783, 789, 883, 1092, 
1124 
Multilinear map, 53-57, 782-789 
tensor-valued, 787 
Multiplicative identity, 20 
Multivalued functions, 365-371 


N 
n-equivalent functions, 1018 
n-sphere, 860, 865 
n-th jet space, 1018 
n-tuple, 3 
complex, 21 
real, 21 
Napoleon, 267, 581 
Natural isomorphism, 785, 820 
Natural numbers, 2, 9 
Natural pairing, 783 
Neighborhood 
open round, 519 
Neumann, 246, 753 
biography, 671 
Neumann BC, 643 
Neumann BVP, 643, 671-673 
Neumann function, 483 
Neumann series, 548, 653, 654 
Newton, 397, 474, 581, 896, 906, 1056 
Newtonian gravity, 1161-1163 
Nilpotent, 83-85, 88, 91, 539 
Noether, 755 
biography, 1069 
Noether’s theorem, 1065-1069 
classical field theory, 1069-1073 
NOLDE 
circuit matrix, 462 
constant coefficients, 446-449 
existence and uniqueness, 611 
integrating factor, 632 
simple branch point, 463 
Non-local potential, 683 
Nondegenerate subspace, 810 
Norm, 215, 217, 291, 513-515, 529, 544, 
812 
of a vector, 36 
operator, 514 
product of operators, 516 
Normal coordinates, 1138-1140 
Normal operator, 177 
Normal subgroup, 709 
Normal vectors, 32 
Normed determinant function, 815 
Normed linear space, 36 
Null divergence, 1066 
Null Lagrangian, 1060, 1061 
Null space, 41, 551, 554 
Null vector, 808, 941 
Nullity, 41 
Number 
complex, 2 
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integer, 2 
natural, 2, 9 
rational, 4, 9, 10 
real, 2 


oO 
ODE, 417-419 
first order 
symmetry group, 1037-1039 
higer order 
symmetry group, 1039, 1040 
Ohm, 666 
Olbers, 482 
One-form, 882 
One-parameter group, 881 
One-to-one correspondence, 6 
Open ball, 519 
Open subset, 520 
Operation 
binary, 7 
Operations on matrices, 142-146 
Operator, 39 
adjoint, 113 
existence of, 517 
adjoint of, 46 
angular momentum, 398 
eigenvalues, 401—405 
annihilation, 444 
anti-hermitian, 115 
bounded, 513-517 
Casimir, 969-971 
closed, 564 
bounded, 564 
compact, 523-526 
spectral theorem, 527-534 
compact Hermitian 
spectral theorem, 530 
compact normal 
spectral theorem, 532 
compact resolvent, 564 
conjugation, 113, 114 
creation, 444 
derivative, 40, 107-112 
determinant, 55, 56 
diagonalizable, 174 
differential, 511,512 
domain of, 563 
evolution, 109 
expectation value of, 115 
extension of, 564 
finite rank, 524 
formally self-adjoint, 649 
functions of, 104-106, 188-191 
hermitian, 114-119, 564 
eigenvalue, 178 
hermitian conjugate of, 113 
Hilbert-Schmidt, 525, 551, 567 
Hodge star, 820-823 
idempotent, 119-125 
integral, 511, 512 
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Operator (cont.) 
integration, 40 
inverse, 101 
involution, 72 
kernel of an, 41 
local, 512 
negative powers of, 103 
norm of, 514 
normal, 177 
diagonalizable, 181 
eigenspace of, 179 
null space of an, 41 
polar decomposition, 205—208 
polarization identity, 41 
polynomials, 102-104 
positive, 117 
positive definite, 117 
projection, 120-125 
orthogonal, 121 
pullback of an, 51 
raising and lowering, 403 
regular point, 517 
representation of, 138 
resolvent of, 534 
right-shift, 513 
scalar, 757 
self-adjoint, 46, 115, 564 
skew, 46 
spectrum, 517, 518 
spectrum of, 173 
square root, 189 
square root of, 189 
strictly positive, 117 
Sturm-Liouville, 564, 566 
symmetric, 193 
tensor 
irreducible, 756-758 
trace of, 161 
unbounded 
compact resolvent, 563-569 
unitary, 114-119, 189 
eigenvalue, 178 
Operator algebra, 101-107 
Lie algebra o(p,n — p), 940-943 
Opposite algebra, 66 
Optical theorem, 378 
Orbit, 728, 918 
Orbital angular momentum, 1073 
Ordered pairs, 2 
Orientable manifolds, 898 
Orientation, 800, 801, 898 
positive, 801 
Oriented basis, 800 
Orthogonal, 40 
Orthogonal basis 
Riemannian geometry, 1148-1155 
Orthogonal complement, 169, 528-530, 
551, 729, 747, 802, 812, 841 
Orthogonal group, 706, 925 
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Lie algebra of, 925 
Orthogonal polynomial, 222—225, 579 
classical, 241, 241-243 
classification, 245 
differential equation, 243 
generating functions, 257 
recurrence relations, 245 
expansion in terms of, 254—257 
least square fit, 225 
Orthogonal transformation, 154 
Orthogonal vectors, 32 
Orthogonality, 32, 33 
group representation, 732-737 
Orthonormal basis, 32 
indefinite inner product, 812-819 
matrix representation, 146-148 


P 
p-form, 796 
vector-valued, 800 
Pairing 
natural, 783 
Parabolic PDE, 641, 673-678 
Parallel displacement, 1090 
Parallel section, 1091, 1119 
Parallelism, 1089-1091 
Parallelogram law, 37 
Parameter 
affine, 1138 
Parity, 718 
Hermite polynomials, 262 
Legendre polynomials, 262 
Parseval equality, 220 
Parseval inequality, 219, 958 
Parseval’s relation, 291 
Particle field, 1101 
Particle in a box, 582-584 
Particle in a cylindrical can, 601 
Particle in a hard sphere, 593 
Partition, 4, 720 
Past light cone, 941 
Pauli spin matrices, 146, 938, 944 
Clifford algebra representations, 
997-1001 
PDE, 635-643 
Cauchy data, 636 
Cauchy problem, 636 
characteristic hypersurface, 636-640 
characteristic system of, 1012 
elliptic, 665-673 
mixed BCs, 673 
homogeneous, 397 
hyperbolic, 678-680 
inhomogeneous, 397 
order of, 636 
parabolic, 673-678 
principal part, 636 
second order, 640-643 
second-order 
elliptic, 641 
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PDE (cont.) 
hyperbolic, 641 
parabolic, 641 
ultrahyperbolic, 641 
PDEs of mathematical physics, 395-398 
Peano, 897 
Peirce decomposition, 87, 89, 90, 100 
Periodic BC, 571 
Permutation, 53 
cyclic, 717 
even, 719 
odd, 719 
parity of, 718 
Permutation group, 715 
Permutation tensor, 816 
Perturbation theory, 655, 748 
degenerate, 660, 661 
first-order, 660 
nondegenerate, 659, 660 
second-order, 660 
Peter-Weyl theorem, 960 
Fourier series, 960 
PFB 
local section, 1083 
Phase space, 801 
Photon capture 
cross section, 1173 
Piecewise continuous, 266 
Pin(, v), 995 
Planck, 523, 1164 
Poincaré, 115, 533, 552, 672, 799, 1164 
biography, 895 
Poincaré algebra, 943, 948 
representation, 975-983 
Poincaré group, 707, 917, 943, 979 
Poincaré lemma, 894 
converse of, 895 
Poisson, 246, 568, 581, 666, 703 
Poisson bracket, 908 
Poisson integral formula, 671 
Poisson’s equation, 395, 648, 1162 
Polar decomposition, 205-208 
Polarization identity, 41, 812 
Pole, 342 
Polynomial, 20 
inner product, 32 
operators, 102-104 
orthogonal, 222-225 
Polynomial algebra, 95—97 
Positive definite operator, 117 
Positive operator, 117 
Positive orientation, 801 
Potential 
gauge, 1099-1105 
non-local, 683 
separable, 683 
Power series, 319 
differentiation of, 320 
integration of, 320 
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SOLDE solutions, 436-446 
uniform convergence, 320 
Lie algebra p(p, n — p), 940-943 
Preimage, 5 
Primitive idempotent, 88, 94, 999, 1001, 
1002 
Principal fiber bubdle 
curvature form, 1091 
Principal fiber bundle, 1079-1086 
associated bundle, 1084-1086 
base space, 1080 
connection, 1086-1091 
matrix structure group, 1096, 1097 
curvature 
matrix structure group, 1096, 1097 
curvature form, 1097 
curve 
horizontal lift, 1089 
fundamental vector field, 1086 
global section, 1083 
lift of curve, 1089 
parallelism, 1089-1091 
reducible, 1082 
structure group, 1080 
matrix, 1096, 1097 
trivial, 1080 
vector field 
horizontal lift, 1089 
Principal idempotent, 87-89 
Principal part 
PDE, 636 
Principal value, 354-358, 685 
Product 
Cartesian, 2, 7 
dot, 7 
inner, 29-38 
tensor, 28, 29 
Product manifold, 863 
Projectable symmetry, 1017 
Projection, 6 
Projection operator, 120-125, 169, 174, 
180, 527, 329, 532,536, 552, 
655-657, 688, 748, 809 
completeness relation, 123 
orthogonal, 121 
Projective group 
density function, 936 
one-dimensional, 920 
Projective space, 4 
Prolongation, 1017-1024 
functions, 1017-1021 
groups, 1021, 1022 
of a function, 1019 
vector fields, 1022-1024 
Propagator, 654, 678 
Feynman, 688 
Proper subset, 2 
Priifer substitution, 574 
Pseudo-Riemannian manifold, 1144 
Pseudotensorial form, 1092 
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Puiseux, biography, 365 

Pullback, 789, 883, 888, 898, 1094, 1112 
linear transformation, 51 
of p-forms, 796 


Q 
Quadratic form, 843 
Quantization 

harmonic oscillator 

algebraic, 445 
analytic, 443 

hydrogen atom, 481 
Quantum electrodynamics, 654 
Quantum harmonic oscillator, 444-446 
Quantum mechanics 

angular momentum, 405 
Quantum particle in a box, 582-584 
Quantum state 

even, odd, 727 
Quark, 753, 754, 980 
Quaternion, 69, 98, 831, 846, 847, 856, 

907, 989, 990, 993, 996, 1070 

absolute value, 70 

conjugate, 69 

pure part, 69 

real part, 69 
Quotient group, 710 
Quotient map, 6 
Quotient set, 4, 24 
Quotient space, 24, 25 


R 
r-cycle, 716 
Radical, 84-88 
Radon-Hurwitz number, 1002 
Raising indices, 805 
Raising operator, 403 
Range of a map, 5 
Rank of a matrix, 158 
Rational function, 343 
integration of, 345-348 
Rational numbers, 4, 9, 10 
dense subset of reals, 520 
Rational trig function 
integration of, 348-350 
Real coordinate space, 21 
Real normal operator 
spectral decomposition, 198-205 
Real vector space, 20 
Realization, 715 
Reciprocal lattice vectors, 276 
Recurrence relations, 222 
Redshift, 1173 
Reduced matrix elements, 758 
Reducible bundle, 1082 
Reducible representation, 729 
Reflection, 808 
Reflection operator, 121 
Reflection principle, 374-376 
Reflexivity, 3 
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Regular point, 301, 460 
operator, 517, 551 
Regular representation, 128, 739 
Regular singular point 
SOLDE, 464 
Relation, 3, 24 
equivalence, 3, 4 
Relative acceleration, 1160 
Relativistic electromagnetism, 889 
Relativity 
general, 1163-1174 
Removable singularity 
FOLDE, 461 
Representation 
abelian group, 733 
action on Hilbert space, 726 
adjoint, 732, 755, 1092, 1102 
algebra, 125-131 
angular momentum, 402 
carrier space, 726 
character of, 736 
classical adjoint, 152 
Clifford algebras, 987-1006 
compact Lie group, 945, 953-963 
complex conjugate, 732 
dimension of, 726 
direct sum, 128, 731 
equivalent, 127, 726 
faithful, 126, 726 
general linear group, 715, 963-966 
Representation of 
gl(n, R), 968 
Representation 
group, 725-732 
adjoint, 755 
analysis, 737-739 
antisymmetric, 745, 771 
identity, 754, 758, 771 
irreducible, 734, 737 
irreducible basis function, 746-750 
irreducible in regular, 739 
orthogonality, 732-737 
tensor product, 750-758 
trivial, 769 
group algebra, 740-743 
hermitian operator, 182 
identity, 726, 1092 
irreducible, 127, 729 
compact Lie group, 957 
finite group, 730 
general linear group, 964 
Lie group, 1072 
semi-simple algebra, 130 
Kronecker product, 751 
Lie algebra, 948, 966-983 
Casimir operator, 969 
Lie group, 937, 953 
unitary, 953 
matrix 
orthonormal basis, 146-148 
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Representation (cont.) 
operator, 161, 169, 199, 923 
operators, 138 
orthogonal operator, 201 
quantum mechanics, 734, 748 
quaternions, 126 
reducible, 729 
regular, 128, 739 
semi-simple algebra, 130 
simple algebra, 129 
Representation of 
s(n, C), 968 
Representation 
g§0(3), 972 
s0(3, 1), 974 
structure group, 1092, 1101, 1117, 
1143, 1144 
subgroup, 743 
subgroups of GL(V), 967-969 
Representation of 
su(n), 969 
Representation 
symmetric group, 761-776 
analytic construction, 761-763 
graphical construction, 764—767 
products, 774-776 
Young tableaux, 766 
tensor product, 128, 751 
antisymmetrized, 752 
character, 751 
symmetrized, 752 
trivial, 732, 1092 
twisted adjoint, 987 
Representation of 
u(n), 968 
Representation 
unitary, 730 
compact Lie group, 954 
upper-triangular, 175 
vectors, 137 
Residue, 339-341 
definite integrals, 344-358 
definition, 340 
integration 
rational function, 345-348 
rational trig function, 348-350 
trig function, 350-352 
Residue theorem, 340 
Resolution of identity, 536, 740, 774 
Resolvent, 534-539 
compact, 564 
unbounded operator, 563-569 
Green’s functions, 630 
Laurent expansion, 535 
perturbation theory, 655 
Resolvent set, 517 
openness of, 521 
Resonant cavity, 585, 597 
Riccati equation, 455, 1040 
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Ricci, 1131, 1146 
Ricci tensor, 1162, 1163, 1165 
Riemann, 36, 268, 366, 755, 896, 956, 
1055, 1130 
biography, 1144 
Riemann identity, 472 
Riemann normal coordinates, 1138-1140 
Riemann sheet, 365, 367 
Riemann surface, 366-371 
Riemann-Christoffel symbols, 1130 
Riemannian geometry, 1143-1174 
gravity 
Newtonian, 1161-1163 
isometry, 1155-1159 
Killing vector field, 1155-1159 
Newtonian gravity, 1161-1163 
orthogonal bases, 1148-1155 
Riemannian manifold, 1144 
Riesz-Fischer theorem, 222 
Right annihilator, 73 
Right coset, 708 
Right ideal, 73 
Right translation, 921 
Right-invariant 1-form, 921 
Right-invariant vector field, 921 
Right-shift operator, 513 
eigenvalues of, 518 
Rigid rotations, 706 
Rodriguez formula, 243, 245, 446 
Rosetta stone, 267 
Rotation algebra, 972 
Rotation group, 727, 970 
character, 973 
Rotation matrix, 972 
Wigner formula, 972 
Russell, 11, 897 


s 
Saddle point approximation, 382 
Sawtooth voltage, 270 
Scalar, 20 
Scalar operator, 757, 758 
Scalar product, 29 
Scale transformations, 920 
Scattering theory, 595 
Schelling, 791 
Schmidt, biography, 34 
Schopenhauer, 791 
Schrédinger, 115, 907 
Schrédinger equation, 109, 396, 442, 469, 
480, 582, 593, 683, 727 
classical limit, 452, 453 
one dimensional, 451 
Schur, 764, 957, 981 
biography, 734 
Schur’s lemma, 732, 733, 758, 953, 969 
Schwarz, 523, 792 
biography, 36 
Schwarz inequality, 35, 59, 211, 218, 222, 
515, 540, 950, 956 
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Schwarz reflection principle, 374-376 
Schwarzschild, biography, 1164 
Schwarzschild geodesic, 1169-1174 
Schwarzschild metric, 1149 
Schwarzschild radius, 1169 
Second order PDE, 640-643 
Second-order PDE 

classification, 641 


Section 
global, 1083 
local, 1083 


parallel, 1091, 1119 
Selection rules, 753 
Self-adjoint, 115, 193, 194, 198, 201, 206, 
433, 435, 533, 566, 569, 613, 
616, 619, 628, 633, 649, 663, 
665, 673, 679, 692, 694, 956 
formally, 613 
Semi-Riemannian manifold, 1144 
Semi-simple algebra, 88-91, 92, 92, 94, 
130, 764, 799, 844 
Semi-simple Lie algebra, 948 
Separable kernel, 556 
Separable potential, 683 
Separated boundary conditions, 566 
Separation of variables, 396 
Cartesian, 579-585 
conducting box, 579-581 
conducting plate, 581, 582 
quantum particle in a box, 582-584 
wave guides, 584, 585 
cylindrical, 586-590 
conducting cylindrical can, 
586-588 
current distribution, 589, 590 
cylindrical wave guide, 588, 589 
spherical, 590-595 
Helmholtz equation, 593 
particle in a sphere, 593, 594 
plane wave expansion, 594, 595 
radial part, 591, 592 
Separation theorem, 430-432 
Sequence, 9 
Cauchy, 9 
complete orthonormal, 219 
Series 
Clebsch-Gordan, 754 
complex, 319-321 
Fourier, 265-276 
Fourier-Bessel, 587 
Laurent, 321-330 
Neumann, 653, 654 
SOLDE solutions, 436-446 
Taylor, 321-330 
vector, 215—220 
Sesquilinear inner product, 31 
Set, 1-4 
Cantor, 12 
compact, 519-523 
complement of, 2 
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countably infinite, 11 
element of, 1 
empty, 2 
intersection, 2 
matrices, 7 
natural numbers, 2 
partition of a, 4 
uncountable, 12 
union, 2 
universal, 2 
Sharp map, 801, 902 
Signature of g, 313 
Similarity transformation, 148-151 
orthonormal basis, 149 
Simple algebra, 76, 88, 90-92, 94, 126, 
129, 852, 948, 999 
classification, 92-95 
Simple arc, 309 
Simple character, 737 
Simple Lie algebra, 948 
Simple pole, 342 
Simple zero, 330 
Simultaneous diagonalizability, 185 
Simultaneous diagonalization, 185-188 
Singleton, 2 
Singular point, 301, 339, 354, 355 
differential equation, 422 
irregular, 461 
isolated, 463 
regular, 461, 470 
removable, 342 
Sturm-Liouville equation, 572 
transformation, 644 
Singularity, 301, 302, 324, 439, 637 
confluent HGDE, 479 
essential, 342 
Green’s function, 651 
isolated, 339, 342-344 
classification, 342 
rational function, 343 
removable, 343, 355 
Schwarzschild solution, 1169 
Skew-symmetry, 53, 793 
Skin depth, 589 
SL(V) as a Lie group, 916 
SL(V) 
Lie algebra of, 924 
normal subgroup of GL(V), 711 
Smooth arc, 309 
SOLDE, 421-425 
adjoint, 434 
branch point, 464 
canonical basis, 463 
characteristic exponents, 465 
complex, 463-469 
confluent hypergeometric, 479 
constant coefficients, 446-449 
existence theorem, 440 
Frobenius method, 439-444 
homogeneous, 422 
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SOLDE (cont.) 
hypergeometric 
Jacobi functions, 477 
hypergeometric function, 473 
indicial equation, 465 
integral equation of, 545 
Lagrange identity, 435 
normal form, 422 
power-series solutions, 436-446 
regular singular point, 464 
singular point, 422 
Sturm-Liouville systems, 569-573 
uniqueness theorem, 424 
variation of constants, 429 
WKB method, 450-453 
Wronskian, 425 
SOLDO, 614 
Solid angle 
m-dimensional, 646 
Solid-state physics, 275 
Space 
Banach, 218 
complex coordinate, 21 
dual, 48 
factor, 24, 25, 77 
inner product, 31 
metric, 8-10 
complete, 10 
projective, 4 
quotient, 24, 25 
real coordinate, 21 
square-integrable functions, 221 
target, 5 
vector, 19-29 
Spacelike vector, 941 
Spacetime 
spherically symmetric, 1168 
static, 1167 
stationary, 1167 
Spacetime translation, 1070 
Span, 22 
Special linear group, 706 
Special orthogonal group, 706, 925 
Lie algebra of, 925 
Special relativity, 808, 940, 975, 979, 
1059 
Special unitary group, 706 
Lie algebra of, 924 
Spectral decomposition 
complex, 177-188 
orthogonal operator, 201 
real, 191-205 
real normal operator, 198-205 
symmetric operator, 193-198 
Spectral decomposition theorem, 688 
Spectral theorem 
compact hermitian, 530 
compact normal, 532 
compact operators, 527-534 
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Spectrum 
bounded operator, 522 
closure of, 521 
compact operator, 527 
Hilbert space operator, 517 
integral operator, 545 
linear operator, 517, 518 
permutation operator, 208 
Spherical Bessel functions, 487, 593 
expansion of plane wave, 594 
Spherical coordinates 
multidimensional, 645, 646 
Spherical harmonics, 406-413, 970 
addition theorem, 412, 413, 974 
definition, 408 
expansion in terms of, 411, 412 
expansion of plane wave, 595, 698 
first few, 410 
Spin representation, 1003 
faithful, 1003 
Spin(sz, v), 996 
Spinor, 995-1006 
algebra Ci, (R), 1001-1003 
Spinor bundles, 1101 
Spinor space, 1003 
Spinoza, 791 
Split complex numbers, 847 
Square wave voltage, 269 
Square-integrable functions, 221-227 
Stabilizer, 918 
Standard basis, 23 
Standard horizontal vector field, 1121 
Standard model, 1079 
Static spacetime, 1167 
Stationary spacetime, 1167 
Steepest descent method, 382-388 
Step function, 231, 357, 684 
Stereographic projection 
n-sphere, 865 
two-sphere, 862 
Stirling approximation, 385 
Stokes’ Theorem, 899 
Stone-Weierstrass theorem, 222 
generalized, 265 
Stress energy tensor, 1165 
Strictly positive operator, 117 
Strictly upper triangular matrices, 66 
Structure 
complex, 45-48 


Structure constant, 78, 937, 939, 976, 984, 
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Lie algebra, 927 
Structure equation, 1093 
Structure group 

matrix, 1096, 1097 
Sturm, biography, 568 
Sturm-Liouville 

operator, 566 

problem, 243, 674 

system, 411, 567, 569-573, 689 
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Sturm-Liouville (cont.) 
asymptotic behavior, 573-577 
completeness, 577 
eigensolutions, 567 
eigenvalues, 568 
expansion in eigenfunctions, 
577-579 
large argument, 577 
large eigenvalues, 573-576 
regular, 567 
singular, 572 
Subalgebra, 64, 73-78 
Subgroup, 705-713 
conjugate, 707 
generated by a subset, 707 
normal, 709 
trivial, 706 
Submanifold, 863 
open, 863 
Subset, 2 
bounded, 520 
closed, 520 
convex, 528 
dense, 520 
open, 520 
proper, 2 
Subspace, 22-24 
invariant, 44, 127, 169-172, 175, 177, 
192, 193, 198, 402, 530, 728, 
731, 733, 734, 738, 740, 749, 
758, 840, 955, 959, 967, 969, 
O77, 989 
nondegenerate, 810 
stable, 99, 127, 989 
Sum 
direct, 25-28 
Superposition principle 
linear DEs, 422 
Surjective map, 5 
Symmetric algebra, 791 
Symmetric bilinear form, 804 
classification, 807 
definite, 807 
indefinite, 807 
index of, 807 
inner product, 805 
negative definite, 807 
negative semidefinite, 807 
nondegenerate, 805 
positive definite, 807 
positive semidefinite, 807 
semidefinite, 807 
Symmetric group, 704, 715-720 
characters 
graphical construction, 767-771 
cycle, 716 
identical particles, 774 
irreducible representation of, 772 
permutation 


parity of, 718 
representation, 761-776 
analytic construction, 761-763 
antisymmetric, 732 
graphical construction, 764—767 
products, 774-776 
Young operators, 771-774 
transposition, 717 
Symmetric homomorphism, 705 
Symmetric operator 
extremum problem, 197 
spectral decomposition, 193-198 
Symmetric product, 791 
Symmetrizer, 790 
Symmetry, 3, 8 
algebraic equations, 1009-1014 
calculus of variations, 1062-1065 
conservation laws, 1065-1069 
classical field theory, 1069-1073 
differential equations, 1014-1024 
first-order ODEs, 1037-1039 
heat equation, 1030-1034 
higher-order ODEs, 1039, 1040 
multiparameter, 1040-1043 
tensors, 789-794 
wave equation, 1034-1036 
Symmetry group 
defining equations, 1030 
of a subset, 1009 
of a system of DEs, 1017 
projectable, 1017 
transform of a function, 1016 
variational, 1062 
Symplectic algebra, 939 
Symplectic charts, 902 
Symplectic form, 801, 902 
rank of, 801 
Symplectic geometry, 51, 901-909, 1079 
conservation of energy, 906 
Symplectic group, 707, 803, 939 
Symplectic manifold, 902 
Symplectic map, 801, 902 
Symplectic matrix, 804 
Symplectic structure, 902 
Symplectic transformation, 801 
Symplectic vector space, 801-804 
canonical basis of, 802 
Hamiltonian dynamics, 803 


T 

Tangent bundle, 877 

Tangent space, 869 

Tangent vector, 868 
manifold, 866-872 

Tangential coordinates, 637 

Tangents to a curve 
components, 874 

Target space, 5 

Taylor expansion, 104 

Taylor formula, 96 
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Taylor series, 321-330 
construction, 321 

Tensor, 784 
classical definition, 787 
components of, 785 
contravariant, 784 
contravariant-antisymmetric, 793 
contravariant-symmetric, 789 
covariant, 784 
covariant-antisymmetric, 793 
covariant-symmetric, 789 
dual space, 782 
Levi-Civita, 799 
multilinear map, 782-789 
symmetric, 789 
symmetric product, 791 
symmetries, 789-794 
transformation law, 786 
types of, 784 

Tensor algebra, 784 

Tensor bundle, 883 

Tensor field, 883, 887 
crucial property of, 883 
curvature, 1125-1132 
manifold, 876-888 
torsion, 1125-1132 

Tensor operator 
irreducible, 756-758 

Tensor product, 28, 29, 783, 784 
algebra, 68 
group representation 

Clebsch-Gordan decomposition, 
753-156 

of vector spaces, 751 

Tensorial form, 1092 

Test function, 233 

Theta function, 357 

Timelike vector, 941 

Topology, 8 

Torsion, 1125-1132 

Torsion form, 1122 

Torsion tensor field, 1125 

Total derivative, 1027 

Total divergence, 1060 

Total matrix algebra, 78-80, 92, 846, 850, 

852, 997, 999 

Total space, 1080 

Trace, 160-162 
and determinant, 161 
definition, 160 
log of determinant, 162 
relation to determinant, 160 

Transformation 
similarity, 148-151 

Transformation group, 704 

Transition function, 1081 

Transivity, 3 

Translation, 919 

Translation operator, 209 

Transpose of a matrix, 142 
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Traveling waves, 584 
Triangle inequality, 8, 36, 38, 133, 216, 
301 
Trigonometric function 
integration of, 350-352 
Trivial bundle, 1080 
Trivial homomorphism, 705 
Trivial representation, 732 
Trivial subgroup, 706 
Twin paradox 
as a variational problem, 1060 
Twisted adjoint representation, 987 


U 
Unbounded operator, 563-569 
Uncertainty principle, 133 
Uncertainty relation, 279 
Uncountable set, 12 
Union, 2 
Unit circle, 7 
Unital algebra, 63, 72 
Unital homomorphism, 72 
Unitary, 40 
Unitary group, 706 

Lie algebra of, 924 
Unitary operator, 114-119 
Unitary representation, 730 
Universal set, 2 
Upper-triangular matrix, 66, 83, 175, 176 


Vv 
Vandermonde, biography, 153 
Variational derivative, 1051 
Variational problem, 1053-1060 

twin paradox, 1060 
Variational symmetry group, 1062 
Vector, 19 

Cartesian, 19 

component, 23 

dual of, 51 

infinite sum, 215-220 

isotropic, 808 

length, 36-38 

norm of, 36 

normal, 32 

null, 808 

orthogonal, 32 

tangent 

manifold, 866-872 

Vector bundle, 1117 
Vector field, 877 

as streamlines, 879 

complete, 881 

curl of, 889 

flow of a, 881 

fundamental, 1086 

gauge transformation of, 1104 

Hamiltonian, 905 

horizontal, 1087 
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Vector field (cont.) 
integral curve of, 879 
Killing, 1155-1159 
left-invariant, 920 
Lie algebra of, 879 
manifold, 877-882 
standard horizontal, 1121 
vertical, 1087 
Vector potential, 3, 1099 
Vector space, 8, 19-29 
automorphism, 43 
basis 
components in a, 23 
basis of a, 23 
complete, 216 
complex, 20 
definition, 19 
dual, 48 
endomorphism of a, 39 
finite-dimension 
criterion for, 522 
finite-dimensional, 23 
indefinite inner product 
orthonormal basis, 812-819 
subspaces, 809-812 
isomorphism, 43 
linear operator on a, 39 
Minkowski, 815 
normed, 36 
compact subset of, 522 
operator on a, 39 
orientation, 800, 801 
oriented, 800 
real, 20 
self-dual, 805 
semi-Euclidean, 815 
symplectic, 801-804 
Vertical vector field, 1087 
Volterra, biography, 545 
Volterra equation, 543 
Volume element, 801 
relative to an inner product, 816 
Von Humboldt, 246, 666, 792 
Von Neumann, 981 
biography, 532 


Ww 
Wave equation, 395, 584 
hyperbolic, 642 
symmetry group, 1034-1036 
Wave guide, 584 
cylindrical, 588, 589 
rectangular, 584, 585, 600 
Weber-Hermite equation, 487 
Wedderburn decomposition, 92 
Wedge product, 794 


Weierstrass, 10, 36, 366, 640, 792, 946 


biography, 523 

Weight function, 32 

Weyl, 799, 946, 1015, 1070 
biography, 956 

Weyl basis, 938, 947 

Weyl operator, 955 

Wigner, 236, 1015 
biography, 981 

Wigner formula, 972 

Wigner-Eckart theorem, 758 

Wigner-Seitz cell, 276 

WKB method, 450-453 
connection formulas, 451 

Wordsworth, 907 

Wronski, biography, 425 

Wronskian, 425-432, 567 


Y 

Young, 957 

Young antisymmetrizer, 772 

Young frame, 765, 772 
negative application, 768 
positive application, 768 
regular application, 767 

Young operator, 771-774, 963 

Young pattern, 765 

Young symmetrizer, 772 

Young tableaux, 766, 964 
horizontal permutation, 772 
regular graphs, 766 
vertical permutation, 772 

Yukawa potential, 282 


Zz 
Zero of order k, 329 


